Mark Bun

· Assistant ProfessorVerified

Boston University · Computer Science

Active 2013–2026

h-index21

Citations1.9k

Papers10438 last 5y

Funding$175k

Faculty page Lab page

See your match with Mark Bun — sign in to PhdFit.Sign in

About

Mark Bun is an assistant professor in the Department of Computer Science at Boston University. He completed his Ph.D. in computer science at Harvard University in 2016 under the supervision of Salil Vadhan. Prior to joining Boston University, he was a Google Research Fellow at the Simons Institute for the Theory of Computing at UC Berkeley, where he worked on lower bounds and data privacy. He also held a postdoctoral research position in the Theory of Computation Group at Princeton University, hosted by Mark Zhandry. As an undergraduate, he studied mathematics and computer science at the University of Washington. Professor Bun's research broadly spans theoretical computer science, with particular interests in data privacy, computational complexity, cryptography, and the foundations of machine learning. His current research focuses on applying complexity theory methodologies to address practically motivated questions in algorithmic data privacy. Additionally, he investigates the power of real polynomial approximations to Boolean functions and their applications in quantum computation, communication complexity, and learning theory. Throughout his career, Mark Bun has contributed significantly to the understanding of differential privacy and private learning, exploring the computational and statistical aspects of privacy-preserving algorithms. His work bridges theoretical insights with practical challenges in privacy and learning, advancing the foundations of these fields.

Research topics

Machine Learning
Computer Science
Artificial Intelligence
Theoretical computer science
Data Mining
Algorithm
Mathematics
Discrete mathematics

Selected publications

Separating Oblivious and Adaptive Differential Privacy under Continual Observation
arXiv (Cornell University) · 2026-03-11
articleOpen access1st authorCorresponding
We resolve an open question of Jain, Raskhodnikova, Sivakumar, and Smith (ICML 2023) by exhibiting a problem separating differential privacy under continual observation in the oblivious and adaptive settings. The continual observation (a.k.a. continual release) model formalizes privacy for streaming algorithms, where data is received over time and output is released at each time step. In the oblivious setting, privacy need only hold for data streams fixed in advance; in the adaptive setting, privacy is required even for streams that can be chosen adaptively based on the streaming algorithm's output. We describe the first explicit separation between the oblivious and adaptive settings. The problem showing this separation is based on the correlated vector queries problem of Bun, Steinke, and Ullman (SODA 2017). Specifically, we present an $(\varepsilon,0)$-DP algorithm for the oblivious setting that remains accurate for exponentially many time steps in the dimension of the input. On the other hand, we show that every $(\varepsilon,δ)$-DP adaptive algorithm fails to be accurate after releasing output for only a constant number of time steps.
Publisher OA PDF
Separating Oblivious and Adaptive Differential Privacy under Continual Observation
arXiv (Cornell University) · 2026-03-11
preprintOpen access1st authorCorresponding
We resolve an open question of Jain, Raskhodnikova, Sivakumar, and Smith (ICML 2023) by exhibiting a problem separating differential privacy under continual observation in the oblivious and adaptive settings. The continual observation (a.k.a. continual release) model formalizes privacy for streaming algorithms, where data is received over time and output is released at each time step. In the oblivious setting, privacy need only hold for data streams fixed in advance; in the adaptive setting, privacy is required even for streams that can be chosen adaptively based on the streaming algorithm's output. We describe the first explicit separation between the oblivious and adaptive settings. The problem showing this separation is based on the correlated vector queries problem of Bun, Steinke, and Ullman (SODA 2017). Specifically, we present an $(\varepsilon,0)$-DP algorithm for the oblivious setting that remains accurate for exponentially many time steps in the dimension of the input. On the other hand, we show that every $(\varepsilon,δ)$-DP adaptive algorithm fails to be accurate after releasing output for only a constant number of time steps.
Publisher DOI
Privately Learning Decision Lists and a Differentially Private Winnow
arXiv (Cornell University) · 2026-02-07
articleOpen access1st authorCorresponding
We give new differentially private algorithms for the classic problems of learning decision lists and large-margin halfspaces in the PAC and online models. In the PAC model, we give a computationally efficient algorithm for learning decision lists with minimal sample overhead over the best non-private algorithms. In the online model, we give a private analog of the influential Winnow algorithm for learning halfspaces with mistake bound polylogarithmic in the dimension and inverse polynomial in the margin. As an application, we describe how to privately learn decision lists in the online model, qualitatively matching state-of-the art non-private guarantees.
Publisher OA PDF
Privately Learning Decision Lists and a Differentially Private Winnow
Open MIND · 2026-02-07
preprint1st authorCorresponding
We give new differentially private algorithms for the classic problems of learning decision lists and large-margin halfspaces in the PAC and online models. In the PAC model, we give a computationally efficient algorithm for learning decision lists with minimal sample overhead over the best non-private algorithms. In the online model, we give a private analog of the influential Winnow algorithm for learning halfspaces with mistake bound polylogarithmic in the dimension and inverse polynomial in the margin. As an application, we describe how to privately learn decision lists in the online model, qualitatively matching state-of-the art non-private guarantees.
DOI
Interactive Proofs For Distribution Testing With Conditional Oracles
ArXiv.org · 2025-11-27
articleOpen access
We revisit the framework of interactive proofs for distribution testing, first introduced by Chiesa and Gur (ITCS 2018), which has recently experienced a surge in interest, accompanied by notable progress (e.g., Herman and Rothblum, STOC 2022, FOCS 2023; Herman, RANDOM~2024). In this model, a data-poor verifier determines whether a probability distribution has a property of interest by interacting with an all-powerful, data-rich but untrusted prover bent on convincing them that it has the property. While prior work gave sample-, time-, and communication-efficient protocols for testing and estimating a range of distribution properties, they all suffer from an inherent issue: for most interesting properties of distributions over a domain of size $N$, the verifier must draw at least $Ω(\sqrt{N})$ samples of its own. While sublinear in $N$, this is still prohibitive for large domains encountered in practice. In this work, we circumvent this limitation by augmenting the verifier with the ability to perform an exponentially smaller number of more powerful (but reasonable) \emph{pairwise conditional} queries, effectively enabling them to perform ``local comparison checks'' of the prover's claims. We systematically investigate the landscape of interactive proofs in this new setting, giving polylogarithmic query and sample protocols for (tolerantly) testing all \emph{label-invariant} properties, thus demonstrating exponential savings without compromising on communication, for this large and fundamental class of testing tasks.
Publisher OA PDF DOI
Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release
ArXiv.org · 2025-02-04
preprintOpen access1st authorCorresponding
The technical literature about data privacy largely consists of two complementary approaches: formal definitions of conditions sufficient for privacy preservation and attacks that demonstrate privacy breaches. Differential privacy is an accepted standard in the former sphere. However, differential privacy's powerful adversarial model and worst-case guarantees may make it too stringent in some situations, especially when achieving it comes at a significant cost to data utility. Meanwhile, privacy attacks aim to expose real and worrying privacy risks associated with existing data release processes but often face criticism for being unrealistic. Moreover, the literature on attacks generally does not identify what properties are necessary to defend against them. We address the gap between these approaches by introducing demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy. This condition captures privacy violations arising from inferences about individuals that are incoherent with respect to the demographic patterns in the data. Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process. Thus, we capture privacy threats that exist even when no attack is explicitly being carried out. Our framework not only provides a condition with respect to which data release algorithms can be analysed but suggests natural experimental evaluation methodologies that could be used to build practical intuition and make tangible assessment of risks. Finally, we argue that demographic coherence is weaker than differential privacy: we prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
Publisher OA PDF DOI
Differentially Private Wasserstein Barycenters
ArXiv.org · 2025-10-03
preprintOpen access
The Wasserstein barycenter is defined as the mean of a set of probability measures under the optimal transport metric, and has numerous applications spanning machine learning, statistics, and computer graphics. In practice these input measures are empirical distributions built from sensitive datasets, motivating a differentially private (DP) treatment. We present, to our knowledge, the first algorithms for computing Wasserstein barycenters under differential privacy. Empirically, on synthetic data, MNIST, and large-scale U.S. population datasets, our methods produce high-quality private barycenters with strong accuracy-privacy tradeoffs.
Publisher OA PDF DOI
Oracle-Efficient Differentially Private Learning with Public Data
2024-01-01
article
Publisher DOI
Private PAC Learning May be Harder than Online Learning
arXiv (Cornell University) · 2024-02-16
preprintOpen access1st authorCorresponding
We continue the study of the computational complexity of differentially private PAC learning and how it is situated within the foundations of machine learning. A recent line of work uncovered a qualitative equivalence between the private PAC model and Littlestone's mistake-bounded model of online learning, in particular, showing that any concept class of Littlestone dimension $d$ can be privately PAC learned using $\mathrm{poly}(d)$ samples. This raises the natural question of whether there might be a generic conversion from online learners to private PAC learners that also preserves computational efficiency. We give a negative answer to this question under reasonable cryptographic assumptions (roughly, those from which it is possible to build indistinguishability obfuscation for all circuits). We exhibit a concept class that admits an online learner running in polynomial time with a polynomial mistake bound, but for which there is no computationally-efficient differentially private PAC learner. Our construction and analysis strengthens and generalizes that of Bun and Zhandry (TCC 2016-A), who established such a separation between private and non-private PAC learner.
Publisher OA PDF DOI
Differentially private confidence intervals for proportions under stratified random sampling
Electronic Journal of Statistics · 2024-01-01 · 6 citations
articleOpen access
Confidence intervals are a fundamental tool for quantifying the uncertainty of parameters of interest. With the increase of data privacy awareness, developing a private version of confidence intervals has gained growing attention from both statisticians and computer scientists. Differential privacy is a state-of-the-art framework for analyzing privacy loss when releasing statistics computed from sensitive data. Recent work has been done around differentially private confidence intervals, yet to the best of our knowledge, rigorous methodologies on differentially private confidence intervals in the context of survey sampling have not been studied. In this paper, we propose three differentially private algorithms for constructing confidence intervals for proportions under stratified random sampling. We articulate two variants of differential privacy that make sense for data from stratified sampling designs, analyzing each of our algorithms within one of these two variants. We establish analytical privacy guarantees and asymptotic properties of the estimators. In addition, we conduct simulation studies to evaluate the proposed private confidence intervals, and two applications to the 1940 Census data are provided.
Publisher OA PDF DOI

Recent grants

CRII: AF: The Polynomial Method in Learning, Communication, and Quantum Computation
NSF · $175k · 2020–2023

Frequent coauthors

Justin Thaler
36 shared
Thomas Steinke
26 shared
Marco Gaboardi
Boston University
15 shared
Salil Vadhan
Harvard University Press
10 shared
Kobbi Nissim
Georgetown University
9 shared
Jonathan Ullman
Northeastern University
7 shared
Uri Stemmer
7 shared
Robin Kothari
6 shared

Labs

Mark Bun's LabPI
Research in theoretical computer science, including data privacy, computational complexity, cryptography, and the foundations of machine learning.

Education

Ph.D.
Harvard

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Mark Bun

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you