Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Anindya De

Anindya De

· Associate ProfessorVerified

University of Pennsylvania · Computer and Information Science

Active 2007–2026

h-index20
Citations1.3k
Papers15948 last 5y
Funding$694k
See your match with Anindya De — sign in to PhdFit.Sign in

Research topics

  • Artificial Intelligence
  • Computer Science
  • Discrete mathematics
  • Combinatorics
  • Algorithm
  • Mathematics
  • Statistics

Selected publications

  • Halfspaces are hard to test with relative error

    Society for Industrial and Applied Mathematics eBooks · 2026-01-01

    book-chapter

    Several recent works (Chen et al., SODA 2025; Chen et al., ICALP 2025; Chen et al., COLT 2025; Chen et al., manuscript) have studied a model of property testing of Boolean functions under a relative-error criterion. In this model, the distance from a target function \(f : \{0, 1\}^n \rightarrow \{0, 1\}\) that is being tested to a function \(g\) is defined relative to the number of inputs \(x\) for which \(f(x) = 1\); moreover, testing algorithms in this model have access both to a black-box oracle for \(f\) and to independent uniform satisfying assignments of \(f\). The motivation for this model is that it provides a natural framework for testing sparse Boolean functions that have few satisfying assignments, analogous to well-studied models for property testing of sparse graphs.

  • Model-agnostic super-resolution in high dimensions

    ArXiv.org · 2025-11-11

    preprintOpen access

    The problem of super-resolution, roughly speaking, is to reconstruct an unknown signal to high accuracy, given (potentially noisy) information about its low-degree Fourier coefficients. Prior results on super-resolution have imposed strong modeling assumptions on the signal, typically requiring that it is a linear combination of spatially separated point sources. In this work we analyze a very general version of the super-resolution problem by considering completely general non-negative signals (equivalently, distributions) over the $d$-dimensional torus $[0,1)^d$; we do not assume any spatial separation between point sources, or even that the distribution is a finite linear combination of point sources. The question naturally arises: what can be said about super-resolution in such a general setting? - As a warm-up, we first give a set of results for reconstructing distributions under the Wasserstein distance. We establish essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible: we show that for $d$-dimensional distributions, estimates of $\approx \exp(d)$ many Fourier coefficients are both necessary and sufficient for accurate Wasserstein reconstruction. - As our main result, we define a new notion of "heavy hitter" reconstruction for distributions, which essentially amounts to achieving high-accuracy reconstruction of all "sufficiently dense" regions of the distribution. We give essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible under this notion. Our results show that (in sharp contrast with Wasserstein reconstruction) accurate estimates of only $\approx \exp(\sqrt{d})$ many Fourier coefficients are both necessary and sufficient for heavy hitter reconstruction.

  • Halfspaces are hard to test with relative error

    arXiv (Cornell University) · 2025-11-09

    preprintOpen access

    Several recent works [DHLNSY25, CPPS25a, CPPS25b] have studied a model of property testing of Boolean functions under a \emph{relative-error} criterion. In this model, the distance from a target function $f: \{0,1\}^n \to \{0,1\}$ that is being tested to a function $g$ is defined relative to the number of inputs $x$ for which $f(x)=1$; moreover, testing algorithms in this model have access both to a black-box oracle for $f$ and to independent uniform satisfying assignments of $f$. The motivation for this model is that it provides a natural framework for testing \emph{sparse} Boolean functions that have few satisfying assignments, analogous to well-studied models for property testing of sparse graphs. The main result of this paper is a lower bound for testing \emph{halfspaces} (i.e., linear threshold functions) in the relative error model: we show that $\tildeΩ(\log n)$ oracle calls are required for any relative-error halfspace testing algorithm over the Boolean hypercube $\{0,1\}^n$. This stands in sharp contrast both with the constant-query testability (independent of $n$) of halfspaces in the standard model [MORS10], and with the positive results for relative-error testing of many other classes given in [DHLNSY25, CPPS25a, CPPS25b]. Our lower bound for halfspaces gives the first example of a well-studied class of functions for which relative-error testing is provably more difficult than standard-model testing.

  • Relative-error monotonicity testing

    Society for Industrial and Applied Mathematics eBooks · 2025-01-01

    book-chapter

    The standard model of Boolean function property testing is not well suited for testing sparse functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called relative-error testing, which provides a natural framework for testing sparse functions.

  • Lower Bounds for Convexity Testing

    Society for Industrial and Applied Mathematics eBooks · 2025-01-01

    book-chapter

    We consider the problem of testing whether an unknown and arbitrary set S ⊆ ℝn (given as a black-box membership oracle) is convex, versus ε-far from every convex set, under the standard Gaussian distribution.

  • Testing noisy low-degree polynomials for sparsity

    ArXiv.org · 2025-11-11

    preprintOpen access

    We consider the problem of testing whether an unknown low-degree polynomial $p$ over $\mathbb{R}^n$ is sparse versus far from sparse, given access to noisy evaluations of the polynomial $p$ at \emph{randomly chosen points}. This is a property-testing analogue of classical problems on learning sparse low-degree polynomials with noise, extending the work of Chen, De, and Servedio (2020) from noisy \emph{linear} functions to general low-degree polynomials. Our main result gives a \emph{precise characterization} of when sparsity testing for low-degree polynomials admits constant sample complexity independent of dimension, together with a matching constant-sample algorithm in that regime. For any mean-zero, variance-one finitely supported distribution $\boldsymbol{X}$ over the reals, degree $d$, and any sparsity parameters $s \leq T$, we define a computable function $\mathrm{MSG}_{\boldsymbol{X},d}(\cdot)$, and: - For $T \ge \mathrm{MSG}_{\boldsymbol{X},d}(s)$, we give an $O_{s,\boldsymbol{X},d}(1)$-sample algorithm that distinguishes whether a multilinear degree-$d$ polynomial over $\mathbb{R}^n$ is $s$-sparse versus $\varepsilon$-far from $T$-sparse, given examples $(\boldsymbol{x},\, p(\boldsymbol{x}) + \mathrm{noise})_{\boldsymbol{x} \sim \boldsymbol{X}^{\otimes n}}$. Crucially, the sample complexity is \emph{completely independent} of the ambient dimension $n$. - For $T \leq \mathrm{MSG}_{\boldsymbol{X},d}(s) - 1$, we show that even without noise, any algorithm given samples $(\boldsymbol{x},p(\boldsymbol{x}))_{\boldsymbol{x} \sim \boldsymbol{X}^{\otimes n}}$ must use $Ω_{\boldsymbol{X},d,s}(\log n)$ examples. Our techniques employ a generalization of the results of Dinur et al. (2007) on the Fourier tails of bounded functions over $\{0,1\}^n$ to a broad range of finitely supported distributions, which may be of independent interest.

  • Testing convex truncation

    Mathematical Statistics and Learning · 2025-07-07

    articleOpen access1st authorCorresponding

    We study the basic statistical problem of testing whether normally distributed n -dimensional data has been truncated , i.e., altered by only retaining points that lie in some unknown truncation set S \subseteq \mathbb{R}^{n} . As our main algorithmic results, (1) we give an O(n) -sample algorithm that can distinguish the standard normal distribution N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown and arbitrary convex set S ; (2) we give a different O(n) -sample algorithm that can distinguish N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown and arbitrary mixture of symmetric convex sets . Both our algorithms are computationally efficient and run in O(n^{2}) time, which is linear in the size of the input. These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially n^{O(\sqrt{n})} samples. An easy argument shows that no finite number of samples suffices to distinguish N(0,I_{n}) from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible. We also prove that any algorithm (computationally efficient or otherwise) that can distinguish N(0,I_{n}) from N(0,I_{n}) conditioned on an unknown symmetric convex set must use \Omega(n) samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.

  • Detecting Low-Degree Truncation

    2024-06-10 · 2 citations

    articleOpen access1st authorCorresponding

    We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution D over ℝn and a collection of data points in ℝn, distinguish between the two possibilities that (i) the data was drawn from D, versus (ii) the data was drawn from D|S, i.e. from D subject to truncation by an unknown truncation set S ⊆ ℝn. We study this problem in the setting where D is a high-dimensional i.i.d. product distribution and S is an unknown degree-d polynomial threshold function (one of the most well-studied types of Boolean-valued function over ℝn). Our main results are an efficient algorithm when D is a hypercontractive distribution, and a matching lower bound: 1. For any constant d, we give a polynomial-time algorithm which successfully distinguishes D from D|S using O(nd/2) samples (subject to mild technical conditions on D and S); 2. Even for the simplest case of D being the uniform distribution over {±1}n, we show that for any constant d, any distinguishing algorithm for degree-d polynomial threshold functions must use Ω(nd/2) samples.

  • Detecting Low-Degree Truncation

    arXiv (Cornell University) · 2024-02-12

    preprintOpen access1st authorCorresponding

    We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \mathbb{R}^n$. We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d. product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\mathbb{R}^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound: $\bullet$ For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$); $\bullet$ Even for the simplest case of ${\cal D}$ being the uniform distribution over $\{+1, -1\}^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $Ω(n^{d/2})$ samples.

  • Relative-error monotonicity testing

    arXiv (Cornell University) · 2024-10-11

    preprintOpen access

    The standard model of Boolean function property testing is not well suited for testing $\textit{sparse}$ functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called $\textit{relative-error testing}$, which provides a natural framework for testing sparse functions. This new model defines the distance between two functions $f, g: \{0,1\}^n \to \{0,1\}$ to be $$\textsf{reldist}(f,g) := { \frac{|f^{-1}(1) \triangle g^{-1}(1)|} {|f^{-1}(1)|}}.$$ This is a more demanding distance measure than the usual Hamming distance ${ {|f^{-1}(1) \triangle g^{-1}(1)|}/{2^n}}$ when $|f^{-1}(1)| \ll 2^n$; to compensate for this, algorithms in the new model have access both to a black-box oracle for the function $f$ being tested and to a source of independent uniform satisfying assignments of $f$. In this paper we first give a few general results about the relative-error testing model; then, as our main technical contribution, we give a detailed study of algorithms and lower bounds for relative-error testing of $\textit{monotone}$ Boolean functions. We give upper and lower bounds which are parameterized by $N=|f^{-1}(1)|$, the sparsity of the function $f$ being tested. Our results show that there are interesting differences between relative-error monotonicity testing of sparse Boolean functions, and monotonicity testing in the standard model. These results motivate further study of the testability of Boolean function properties in the relative-error model.

Recent grants

Frequent coauthors

  • Rocco A. Servedio

    111 shared
  • Ilias Diakonikolas

    35 shared
  • Elchanan Mossel

    22 shared
  • Shivam Nadimpalli

    Massachusetts Institute of Technology

    20 shared
  • Joe Neeman

    The University of Texas at Austin

    19 shared
  • Chin Ho Lee

    North Carolina State University

    16 shared
  • Sandip Sinha

    Columbia University

    13 shared
  • Thomas Vidick

    13 shared

Labs

  • De Anindya LabPI

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Anindya De

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup