
Anindya De
· Associate ProfessorVerifiedUniversity of Pennsylvania · Computer and Information Science
Active 2007–2026
Research topics
- Artificial Intelligence
- Computer Science
- Discrete mathematics
- Combinatorics
- Algorithm
- Mathematics
- Statistics
Selected publications
Halfspaces are hard to test with relative error
Society for Industrial and Applied Mathematics eBooks · 2026-01-01
book-chapterSeveral recent works (Chen et al., SODA 2025; Chen et al., ICALP 2025; Chen et al., COLT 2025; Chen et al., manuscript) have studied a model of property testing of Boolean functions under a relative-error criterion. In this model, the distance from a target function \(f : \{0, 1\}^n \rightarrow \{0, 1\}\) that is being tested to a function \(g\) is defined relative to the number of inputs \(x\) for which \(f(x) = 1\); moreover, testing algorithms in this model have access both to a black-box oracle for \(f\) and to independent uniform satisfying assignments of \(f\). The motivation for this model is that it provides a natural framework for testing sparse Boolean functions that have few satisfying assignments, analogous to well-studied models for property testing of sparse graphs.
Model-agnostic super-resolution in high dimensions
ArXiv.org · 2025-11-11
preprintOpen accessThe problem of super-resolution, roughly speaking, is to reconstruct an unknown signal to high accuracy, given (potentially noisy) information about its low-degree Fourier coefficients. Prior results on super-resolution have imposed strong modeling assumptions on the signal, typically requiring that it is a linear combination of spatially separated point sources. In this work we analyze a very general version of the super-resolution problem by considering completely general non-negative signals (equivalently, distributions) over the $d$-dimensional torus $[0,1)^d$; we do not assume any spatial separation between point sources, or even that the distribution is a finite linear combination of point sources. The question naturally arises: what can be said about super-resolution in such a general setting? - As a warm-up, we first give a set of results for reconstructing distributions under the Wasserstein distance. We establish essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible: we show that for $d$-dimensional distributions, estimates of $\approx \exp(d)$ many Fourier coefficients are both necessary and sufficient for accurate Wasserstein reconstruction. - As our main result, we define a new notion of "heavy hitter" reconstruction for distributions, which essentially amounts to achieving high-accuracy reconstruction of all "sufficiently dense" regions of the distribution. We give essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible under this notion. Our results show that (in sharp contrast with Wasserstein reconstruction) accurate estimates of only $\approx \exp(\sqrt{d})$ many Fourier coefficients are both necessary and sufficient for heavy hitter reconstruction.
Halfspaces are hard to test with relative error
arXiv (Cornell University) · 2025-11-09
preprintOpen accessSeveral recent works [DHLNSY25, CPPS25a, CPPS25b] have studied a model of property testing of Boolean functions under a \emph{relative-error} criterion. In this model, the distance from a target function $f: \{0,1\}^n \to \{0,1\}$ that is being tested to a function $g$ is defined relative to the number of inputs $x$ for which $f(x)=1$; moreover, testing algorithms in this model have access both to a black-box oracle for $f$ and to independent uniform satisfying assignments of $f$. The motivation for this model is that it provides a natural framework for testing \emph{sparse} Boolean functions that have few satisfying assignments, analogous to well-studied models for property testing of sparse graphs. The main result of this paper is a lower bound for testing \emph{halfspaces} (i.e., linear threshold functions) in the relative error model: we show that $\tildeΩ(\log n)$ oracle calls are required for any relative-error halfspace testing algorithm over the Boolean hypercube $\{0,1\}^n$. This stands in sharp contrast both with the constant-query testability (independent of $n$) of halfspaces in the standard model [MORS10], and with the positive results for relative-error testing of many other classes given in [DHLNSY25, CPPS25a, CPPS25b]. Our lower bound for halfspaces gives the first example of a well-studied class of functions for which relative-error testing is provably more difficult than standard-model testing.
Relative-error monotonicity testing
Society for Industrial and Applied Mathematics eBooks · 2025-01-01
book-chapterThe standard model of Boolean function property testing is not well suited for testing sparse functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called relative-error testing, which provides a natural framework for testing sparse functions.
Lower Bounds for Convexity Testing
Society for Industrial and Applied Mathematics eBooks · 2025-01-01
book-chapterWe consider the problem of testing whether an unknown and arbitrary set S ⊆ ℝn (given as a black-box membership oracle) is convex, versus ε-far from every convex set, under the standard Gaussian distribution.
Testing noisy low-degree polynomials for sparsity
ArXiv.org · 2025-11-11
preprintOpen accessWe consider the problem of testing whether an unknown low-degree polynomial $p$ over $\mathbb{R}^n$ is sparse versus far from sparse, given access to noisy evaluations of the polynomial $p$ at \emph{randomly chosen points}. This is a property-testing analogue of classical problems on learning sparse low-degree polynomials with noise, extending the work of Chen, De, and Servedio (2020) from noisy \emph{linear} functions to general low-degree polynomials. Our main result gives a \emph{precise characterization} of when sparsity testing for low-degree polynomials admits constant sample complexity independent of dimension, together with a matching constant-sample algorithm in that regime. For any mean-zero, variance-one finitely supported distribution $\boldsymbol{X}$ over the reals, degree $d$, and any sparsity parameters $s \leq T$, we define a computable function $\mathrm{MSG}_{\boldsymbol{X},d}(\cdot)$, and: - For $T \ge \mathrm{MSG}_{\boldsymbol{X},d}(s)$, we give an $O_{s,\boldsymbol{X},d}(1)$-sample algorithm that distinguishes whether a multilinear degree-$d$ polynomial over $\mathbb{R}^n$ is $s$-sparse versus $\varepsilon$-far from $T$-sparse, given examples $(\boldsymbol{x},\, p(\boldsymbol{x}) + \mathrm{noise})_{\boldsymbol{x} \sim \boldsymbol{X}^{\otimes n}}$. Crucially, the sample complexity is \emph{completely independent} of the ambient dimension $n$. - For $T \leq \mathrm{MSG}_{\boldsymbol{X},d}(s) - 1$, we show that even without noise, any algorithm given samples $(\boldsymbol{x},p(\boldsymbol{x}))_{\boldsymbol{x} \sim \boldsymbol{X}^{\otimes n}}$ must use $Ω_{\boldsymbol{X},d,s}(\log n)$ examples. Our techniques employ a generalization of the results of Dinur et al. (2007) on the Fourier tails of bounded functions over $\{0,1\}^n$ to a broad range of finitely supported distributions, which may be of independent interest.
Mathematical Statistics and Learning · 2025-07-07
articleOpen access1st authorCorrespondingWe study the basic statistical problem of testing whether normally distributed n -dimensional data has been truncated , i.e., altered by only retaining points that lie in some unknown truncation set S \subseteq \mathbb{R}^{n} . As our main algorithmic results, (1) we give an O(n) -sample algorithm that can distinguish the standard normal distribution N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown and arbitrary convex set S ; (2) we give a different O(n) -sample algorithm that can distinguish N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown and arbitrary mixture of symmetric convex sets . Both our algorithms are computationally efficient and run in O(n^{2}) time, which is linear in the size of the input. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially n^{O(\sqrt{n})} samples. An easy argument shows that no finite number of samples suffices to distinguish N(0,I_{n}) from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown symmetric convex set must use \Omega(n) samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.
Detecting Low-Degree Truncation
2024-06-10 · 2 citations
articleOpen access1st authorCorrespondingWe consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution D over ℝn and a collection of data points in ℝn, distinguish between the two possibilities that (i) the data was drawn from D, versus (ii) the data was drawn from D|S, i.e. from D subject to truncation by an unknown truncation set S ⊆ ℝn. We study this problem in the setting where D is a high-dimensional i.i.d. product distribution and S is an unknown degree-d polynomial threshold function (one of the most well-studied types of Boolean-valued function over ℝn). Our main results are an efficient algorithm when D is a hypercontractive distribution, and a matching lower bound: 1. For any constant d, we give a polynomial-time algorithm which successfully distinguishes D from D|S using O(nd/2) samples (subject to mild technical conditions on D and S); 2. Even for the simplest case of D being the uniform distribution over {±1}n, we show that for any constant d, any distinguishing algorithm for degree-d polynomial threshold functions must use Ω(nd/2) samples.
Detecting Low-Degree Truncation
arXiv (Cornell University) · 2024-02-12
preprintOpen access1st authorCorrespondingWe consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \mathbb{R}^n$. We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d. product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\mathbb{R}^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound: $\bullet$ For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$); $\bullet$ Even for the simplest case of ${\cal D}$ being the uniform distribution over $\{+1, -1\}^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $Ω(n^{d/2})$ samples.
Relative-error monotonicity testing
arXiv (Cornell University) · 2024-10-11
preprintOpen accessThe standard model of Boolean function property testing is not well suited for testing $\textit{sparse}$ functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called $\textit{relative-error testing}$, which provides a natural framework for testing sparse functions. This new model defines the distance between two functions $f, g: \{0,1\}^n \to \{0,1\}$ to be $$\textsf{reldist}(f,g) := { \frac{|f^{-1}(1) \triangle g^{-1}(1)|} {|f^{-1}(1)|}}.$$ This is a more demanding distance measure than the usual Hamming distance ${ {|f^{-1}(1) \triangle g^{-1}(1)|}/{2^n}}$ when $|f^{-1}(1)| \ll 2^n$; to compensate for this, algorithms in the new model have access both to a black-box oracle for the function $f$ being tested and to a source of independent uniform satisfying assignments of $f$. In this paper we first give a few general results about the relative-error testing model; then, as our main technical contribution, we give a detailed study of algorithms and lower bounds for relative-error testing of $\textit{monotone}$ Boolean functions. We give upper and lower bounds which are parameterized by $N=|f^{-1}(1)|$, the sparsity of the function $f$ being tested. Our results show that there are interesting differences between relative-error monotonicity testing of sparse Boolean functions, and monotonicity testing in the standard model. These results motivate further study of the testability of Boolean function properties in the relative-error model.
Recent grants
AF: Small: Threshold Functions--Derandomization, Testing and Applications
NSF · $400k · 2020–2024
AF: Small: Collaborative Research: Boolean Function Analysis Meets Stochastic Design
NSF · $294k · 2019–2023
Frequent coauthors
- 111 shared
Rocco A. Servedio
- 35 shared
Ilias Diakonikolas
- 22 shared
Elchanan Mossel
- 20 shared
Shivam Nadimpalli
Massachusetts Institute of Technology
- 19 shared
Joe Neeman
The University of Texas at Austin
- 16 shared
Chin Ho Lee
North Carolina State University
- 13 shared
Sandip Sinha
Columbia University
- 13 shared
Thomas Vidick
Labs
De Anindya LabPI
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Anindya De
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup