David Choffnes

· Professor, Executive Director - Cybersecurity and Privacy InstituteVerified

Northeastern University · Cybersecurity and Information Systems

Active 2003–2026

h-index39

Citations5.3k

Papers16650 last 5y

Funding$3.4M1 active

Faculty page Lab page

See your match with David Choffnes — sign in to PhdFit.Sign in

About

David Choffnes is a Professor and the Executive Director of the Cybersecurity and Privacy Institute at Khoury. His role involves leading interdisciplinary efforts in cybersecurity and privacy, collaborating with Khoury to advance research and education in these fields. The biography emphasizes his leadership position and his association with Khoury, but does not provide additional details about his research focus, background, or key contributions.

Research topics

Computer Science
Computer Security
Internet privacy
World Wide Web
Political Science
Human–computer interaction
Sociology
Psychology
Operating system
Computer network
Speech recognition
Multimedia
Database
Linguistics
Data science

Selected publications

Beyond the Hype: Empirical Analysis of Matter Standard's Security and Privacy
Zenodo (CERN European Organization for Nuclear Research) · 2026-10-12
articleOpen access
This paper presents an empirical analysis of the security and privacy properties of the Matter IoT standard. Moving beyond theoretical design claims, the authors evaluate real-world implementations to identify practical weaknesses and deployment challenges. The study examines device onboarding, communication protocols, and access control mechanisms, highlighting gaps between the specification and actual behavior. It also explores privacy risks related to metadata exposure and ecosystem interoperability. The findings reveal that while Matter introduces meaningful security improvements, inconsistencies in implementation and ecosystem complexity can undermine its guarantees. The paper concludes with recommendations to strengthen security practices and improve privacy protections in future deployments.DatasetDataset catalogues Matter and non-Matter IoT devices, including bulbs, plugs, bridges, controllers, and miscellaneous devices, across multiple brands, enabling comparison of functionality, compatibility, and ecosystem diversity in smart home deployments. Network traces have been limited to 300MB per device.ScriptsRorating-Device-ID-analysis-script Function to analyze all PCAPs in the specified directory, extract RI values, count occurrences ,plot a histogram and to use different colors for each key (k1, k2, …)
Publisher DOI
Beyond the Hype: Empirical Analysis of Matter Standard's Security and Privacy
Zenodo (CERN European Organization for Nuclear Research) · 2026-10-12
articleOpen access
This paper presents an empirical analysis of the security and privacy properties of the Matter IoT standard. Moving beyond theoretical design claims, the authors evaluate real-world implementations to identify practical weaknesses and deployment challenges. The study examines device onboarding, communication protocols, and access control mechanisms, highlighting gaps between the specification and actual behavior. It also explores privacy risks related to metadata exposure and ecosystem interoperability. The findings reveal that while Matter introduces meaningful security improvements, inconsistencies in implementation and ecosystem complexity can undermine its guarantees. The paper concludes with recommendations to strengthen security practices and improve privacy protections in future deployments.DatasetDataset catalogues Matter and non-Matter IoT devices, including bulbs, plugs, bridges, controllers, and miscellaneous devices, across multiple brands, enabling comparison of functionality, compatibility, and ecosystem diversity in smart home deployments. Network traces have been limited to 300MB per device.ScriptsRorating-Device-ID-analysis-script Function to analyze all PCAPs in the specified directory, extract RI values, count occurrences ,plot a histogram and to use different colors for each key (k1, k2, …)
Publisher DOI
SPHERE CPS Enclave: A Reconfigurable Testbed for Industrial Control System Security Experimentation
2025-05-06
article
Cyber-physical systems (CPS) increasingly face security threats that can disrupt critical infrastructure operations. The SPHERE CPS enclave is a modular, remotely accessible industrial control system (ICS) testbed designed to support security experimentation on programmable logic controllers (PLCs), industrial networks, and digital twin simulations. It enables researchers to investigate cyber-physical attacks, anomaly detection, and intrusion resilience strategies. Unlike general cybersecurity testbeds, SPHERE's CPS enclave provides a configurable, realistic environment for studying adversarial scenarios that bridge cyber and physical domains. The infrastructure offers controlled, reproducible experiments with customizable network topologies and hardware-in-the-loop validation. This poster presents the design philosophy, community-driven experimental goals, and deployment considerations of the SPHERE CPS enclave, demonstrating its potential for advancing CPS security research.
Publisher DOI
Empirically Measuring Data Localization in the EU
Proceedings on Privacy Enhancing Technologies · 2025-05-19 · 1 citations
articleOpen accessSenior author
EU data localization regulations limit data transfers to non-EU countries with the GDPR. However, BGP, DNS and other Internet protocols were not designed to enforce jurisdictional constraints, so implementing data localization is challenging. Despite initial research on the topic, little is known about if or how companies currently operate their server infrastructure to comply with the regulations. We close this knowledge gap by empirically measuring the extent to which servers and routers that process EU requests are located outside of the EU (and a handful of 'adequate' non-EU countries). The key challenge is that both browser measurements (to infer relevant endpoints) and data-plane measurements (to infer relevant IP addresses) are needed, but no large-scale public infrastructure allows both. We build a novel methodology that combines BrightData (browser) and RIPE Atlas (data-plane) probes, with joint measurements from over 1,000 networks in 20 EU countries. We find that, on average, 2.2% of servers serving users in each EU country are located in non-adequate destination countries (1.4% of known trackers). Our findings suggest that data localization policies are largely being followed by content providers, though there are exceptions.
Publisher DOI
Promises, Promises: Understanding Claims Made in Social Robot Consumer Experiences
2025-04-24 · 1 citations
articleOpen access
Social robots are a class of emerging smart consumer electronics devices that promise sophisticated experiences featuring emotive capabilities, artificial intelligence, conversational interaction, and more. With unique risk factors like emotional attachment, little is known on how social robots communicate these promises to consumers and whether they adequately deliver upon them within their overall product experiences prior to and during user interaction. Animated by a consumer protection lens, this paper systematically investigates manufacturer claims made for four commercially available social robots, evaluating these claims against the provided user experience and consumer reviews. We find that social robots vary widely in the manner and extent to which they communicate intelligent features and the supposed benefits of these features, while consumer perspectives similarly include a wide range of perceptions on robot and AI performance, capabilities, and product frustrations. We conclude by discussing social robots’ unique propensities for consumer risk, and consider implications for regulators, developers, and researchers of social robots.
Publisher DOI
Echoes of Privacy: Uncovering the Profiling Practices of Voice Assistants
Proceedings on Privacy Enhancing Technologies · 2025-03-07
articleOpen accessSenior author
Many companies, including Google, Amazon, and Apple, offer voice assistants as a convenient solution for answering general voice queries and accessing their services. These voice assistants have gained popularity and can be easily accessed through various smart devices such as smartphones, smart speakers, smartwatches, and an increasing array of other devices. However, this convenience comes with potential privacy risks. For instance, while companies vaguely mention in their privacy policies that they may use voice interactions for user profiling, it remains unclear to what extent this profiling occurs and whether voice interactions pose greater privacy risks compared to other interaction modalities. In this paper, we conduct 1171 experiments involving 24530 queries with different personas and interaction modalities during 20 months to characterize how the three most popular voice assistants profile their users. We analyze factors such as labels assigned to users, their accuracy, the time taken to assign these labels, differences between voice and web interactions, and the effectiveness of profiling remediation tools offered by each voice assistant. Our findings reveal that profiling can happen without interaction, can be incorrect and inconsistent at times, may take several days or weeks to change, and is affected by the interaction modality.
Publisher OA PDF DOI
MLCerts Docker Images (ICSE 2026)
Zenodo (CERN European Organization for Nuclear Research) · 2025-12-08
otherOpen accessSenior author
Licensed under a Creative Commons Attribution 4.0 International License. Auxiliary material, up to date documentation, and issue tracking available at: https://github.com/rub-softsec/MLCerts The Datasets and Language Models are available at: https://zenodo.org/records/15971208 This archive contains a Docker image for the Differential Testing Framework, an image with patched Transcert implementation and an image for generating synthetic certificates using a pre-trained model. MLCerts Differential Testing Framework Run the image with the corresponding data directory mounted. The outputs of the testing framework will appear in /attached_dir/testing-results and /attached_dir/coverage directories. docker load -i mlcerts_export.tar docker run -it -v ./attached_dir_export:/attached_dir mlcerts_export bash conda deactivate cd /attached_dir ./mlcerts/run_testing.sh . ./cert_data_pem/v3-experiments/ ./mlcerts/ ./LIBS/ ./customCA/cacert.pem Transcert Run the image with the corresponding data directory mounted to find the patched Transcert source code (based on https://github.com/joky27/transcert_related). docker load -i transcert_export.tar docker run -it -v ./attached_dir_export:/attached_dir transcert_export bash conda activate transcert cd /attached_dir The key modifications are to: Use fastcov instead of lcov. Use gmtime_adj* functions instead of set* due to a pyOpenSSL bug (https://github.com/pyca/pyopenssl/issues/311) that has not been fixed due to API deprecation. Language Models LM code requires installation of CUDA drivers specific to the GPUs available. For a simple demonstration, we release a container that relies on the main model used in the paper and uses CPU to generate certificates. The data directory to attach with the container needs to be downloaded: llm-code-mlcerts-export.zip from https://zenodo.org/records/15971208. Run the image with the corresponding data directory mounted: docker load -i mlcerts-llm-cpu-demo.tar docker run -it -v ./MLcerts-EXPORT:/MLcerts-EXPORT mlcerts-llm-cpu-demo /bin/bash Then, for generating synthetic certificates using final model used in paper (IPv4/RNN-Medium with Temperature = 1.5): cd /MLcerts-EXPORT/Char-RNN-PyTorch conda activate py39 python3 generate.py zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000 1024 3 1.5 zmap-data testZmap1M The synthetic ASN outputs will appear in ./outputCerts directory. Due to the reliance on CPU, it may take ~10 minutes per output. Finally, to convert ASN outputs to usable PEM formats: cd /MLcerts-EXPORT/ conda activate myenv python3 asn1_to_pem.py <asn_file_path> <output_dir_path> 2 python3 asn1_to_pem.py Char-RNN-PyTorch/outputCerts/zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000testZmap1M/fbbff4ee-67f0-423b-8647-5e11754ebdf3.asn . 2 and an output.XYZ.pem file is generated, using CA information from customCA/ directory. BibTeX Please cite our paper if you rely on our artifacts for your work. @inproceedings{icse2026-hallucinating-certificates, title = {{Hallucinating Certificates: Differential Testing of TLS Certificate Validation Using Generative Language Models}}, author = {Paracha, Talha and Posluns, Kyle and Borgolte, Kevin and Lindorfer, Martina and Choffnes, David}, booktitle = {Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)}, date = {2026-04}, edition = {48}, editor = {Mezini, Mira and Zimmermann, Thomas}, location = {Rio de Janeiro, Brazil}, publisher = {Association for Computing Machinery (ACM)/Institute of Electrical and Electronics Engineers (IEEE)} }
Publisher DOI
MLCerts Docker Images (ICSE 2026)
Zenodo (CERN European Organization for Nuclear Research) · 2025-12-08
otherOpen accessSenior author
Licensed under a Creative Commons Attribution 4.0 International License. Auxiliary material, up to date documentation, and issue tracking available at: https://github.com/rub-softsec/MLCerts The Datasets and Language Models are available at: https://zenodo.org/records/15971208 This archive contains a Docker image for the Differential Testing Framework, an image with patched Transcert implementation and an image for generating synthetic certificates using a pre-trained model. MLCerts Differential Testing Framework Run the image with the corresponding data directory mounted. The outputs of the testing framework will appear in /attached_dir/testing-results and /attached_dir/coverage directories. docker load -i mlcerts_export.tar docker run -it -v ./attached_dir_export:/attached_dir mlcerts_export bash conda deactivate cd /attached_dir ./mlcerts/run_testing.sh . ./cert_data_pem/v3-experiments/ ./mlcerts/ ./LIBS/ ./customCA/cacert.pem Transcert Run the image with the corresponding data directory mounted to find the patched Transcert source code (based on https://github.com/joky27/transcert_related). docker load -i transcert_export.tar docker run -it -v ./attached_dir_export:/attached_dir transcert_export bash conda activate transcert cd /attached_dir The key modifications are to: Use fastcov instead of lcov. Use gmtime_adj* functions instead of set* due to a pyOpenSSL bug (https://github.com/pyca/pyopenssl/issues/311) that has not been fixed due to API deprecation. Language Models LM code requires installation of CUDA drivers specific to the GPUs available. For a simple demonstration, we release a container that relies on the main model used in the paper and uses CPU to generate certificates. The data directory to attach with the container needs to be downloaded: llm-code-mlcerts-export.zip from https://zenodo.org/records/15971208. Run the image with the corresponding data directory mounted: docker load -i mlcerts-llm-cpu-demo.tar docker run -it -v ./MLcerts-EXPORT:/MLcerts-EXPORT mlcerts-llm-cpu-demo /bin/bash Then, for generating synthetic certificates using final model used in paper (IPv4/RNN-Medium with Temperature = 1.5): cd /MLcerts-EXPORT/Char-RNN-PyTorch conda activate py39 python3 generate.py zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000 1024 3 1.5 zmap-data testZmap1M The synthetic ASN outputs will appear in ./outputCerts directory. Due to the reliance on CPU, it may take ~10 minutes per output. Finally, to convert ASN outputs to usable PEM formats: cd /MLcerts-EXPORT/ conda activate myenv python3 asn1_to_pem.py <asn_file_path> <output_dir_path> 2 python3 asn1_to_pem.py Char-RNN-PyTorch/outputCerts/zmap-data-1024-3-0.0002lr-0.1dropout-epoch3-step300000testZmap1M/fbbff4ee-67f0-423b-8647-5e11754ebdf3.asn . 2 and an output.XYZ.pem file is generated, using CA information from customCA/ directory. BibTeX Please cite our paper if you rely on our artifacts for your work. @inproceedings{icse2026-hallucinating-certificates, title = {{Hallucinating Certificates: Differential Testing of TLS Certificate Validation Using Generative Language Models}}, author = {Paracha, Talha and Posluns, Kyle and Borgolte, Kevin and Lindorfer, Martina and Choffnes, David}, booktitle = {Proceedings of the 48th IEEE/ACM International Conference on Software Engineering (ICSE)}, date = {2026-04}, edition = {48}, editor = {Mezini, Mira and Zimmermann, Thomas}, location = {Rio de Janeiro, Brazil}, publisher = {Association for Computing Machinery (ACM)/Institute of Electrical and Electronics Engineers (IEEE)} }
Publisher DOI
Dark Patterns as Disloyal Design
SSRN Electronic Journal · 2025-01-01
preprintOpen access
Publisher DOI
Empirically Measuring Data Localization in the EU
ArXiv.org · 2025-04-12
preprintOpen accessSenior author
EU data localization regulations limit data transfers to non-EU countries with the GDPR. However, BGP, DNS and other Internet protocols were not designed to enforce jurisdictional constraints, so implementing data localization is challenging. Despite initial research on the topic, little is known about if or how companies currently operate their server infrastructure to comply with the regulations. We close this knowledge gap by empirically measuring the extent to which servers and routers that process EU requests are located outside of the EU (and a handful of ``adequate'' non-EU countries). The key challenge is that both browser measurements (to infer relevant endpoints) and data-plane measurements (to infer relevant IP addresses) are needed, but no large-scale public infrastructure allows both. We build a novel methodology that combines BrightData (browser) and RIPE Atlas (data-plane) probes, with joint measurements from over 1,000 networks in 19 EU countries. We find that, on average, 2.3% of servers serving users in each EU country are located in non-adequate destination countries (1.4% of known trackers). Our findings suggest that data localization policies are largely being followed by content providers, though there are exceptions.
Publisher OA PDF DOI

Recent grants

CI-New: Collaborative Research: An Open Platform for Internet Routing Experiments
NSF · $361k · 2015–2018
SaTC: Frontiers: Collaborative: Protecting Personal Data Flow on the Internet
NSF · $1.7M · 2020–2026
TWC: Small: Efficient Traffic Analysis Resistance for Anonymity Networks
NSF · $508k · 2016–2020
CNS Core: Small: BehavIoT: Modeling and Controlling Internet of Things Behavior Using Netowork-Inferred State Machines
NSF · $507k · 2019–2022
NeTS: Small: A Principled Approach to Enabling Policy Transparency for Mobile Networks
NSF · $308k · 2016–2020

Frequent coauthors

Ashwin Rao
University of Helsinki
45 shared
Martina Lindorfer
TU Wien
44 shared
Narseo Vallina-Rodriguez
43 shared
Álvaro Feal
Universidad del Noreste
38 shared
Amogh Pradeep
Boston University
38 shared
Julien Gamba
Universidad Carlos III de Madrid
37 shared
Alan Mislove
Northeastern University
34 shared
Fabián E. Bustamante
Northwestern University
30 shared

Labs

Cybersecurity and Privacy InstitutePI

Education

Ph.D., Computer Science
Massachusetts Institute of Technology
1996
M.S., Computer Science
Massachusetts Institute of Technology
1993
B.S., Computer Science
University of California, Berkeley
1989

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with David Choffnes

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you