Ilias Diakonikolas

· Sheldon B. Lubar ProfessorVerified

University of Wisconsin-Madison · Computer Sciences

Active 2007–2025

h-index43

Citations6.4k

Papers396137 last 5y

Funding$548k

Faculty page

See your match with Ilias Diakonikolas — sign in to PhdFit.Sign in

About

Ilias Diakonikolas is the Sheldon B. Lubar professor in the Computer Science department at UW Madison. He is a member of the theory of computing group, machine learning@uw-madison, and the Institute for Foundations of Data Science. His affiliations also include the department of Statistics, the Wisconsin Institute for Discovery, and the Data Science Institute. Prior to his current position, he held the Andrew and Erna Viterbi Early Career Chair in Computer Science at USC, and was a faculty member at the University of Edinburgh. He also spent two years at UC Berkeley as the Simons Postdoctoral Fellow in Theoretical Computer Science. He obtained his Ph.D. in Computer Science from Columbia University under the advisement of Mihalis Yannakakis, and completed his undergraduate studies in Greece at the National Technical University of Athens. His research interests encompass algorithms and machine learning, with a focus on understanding the tradeoffs between statistical efficiency, computational efficiency, and robustness in fundamental problems in statistics and machine learning. His work includes areas such as high-dimensional robust statistics, information-computation tradeoffs, foundations of deep learning, nonparametric estimation, distribution testing, and data-driven algorithm design. His contributions have been recognized with numerous awards including the ACM Grace Murray Hopper Award, a Guggenheim Fellowship, a Sloan Fellowship, an NSF CAREER Award, a Marie Curie Fellowship, a Google Faculty Research Award, and best paper awards at NeurIPS and COLT. His research has been supported by organizations such as the NSF, DARPA, ONR, EPSRC, WARF, Google, and the European Commission.

Research topics

Artificial Intelligence
Computer Science
Combinatorics
Mathematics
Algorithm
Statistics
Discrete mathematics

Selected publications

SoS Certifiability of Subgaussian Distributions and Its Algorithmic Applications
2025-06-15 · 2 citations
article1st authorCorresponding
Publisher DOI
Agnostic Product Mixed State Tomography via Robust Statistics
arXiv (Cornell University) · 2025-10-09
preprintOpen access
We study the complexity of two closely related learning problems, one quantum and one classical. In the quantum setting, we consider agnostic tomography for the natural class of product mixed states. Given $N$ copies of an $n$-qubit state $ρ$, the goal is to output a nearly optimal product mixed state approximation in trace distance. While recent work has focused on pure-state ansatz (e.g., product or stabilizer states), no polynomial-time guarantees were previously known for mixed-state ansatz. In the classical setting, we study robust learning of binary product distributions: given samples from an unknown distribution on ${0,1}^n$, the goal is to output a nearly optimal product approximation. Our main contributions are as follows. (1) We give a semi-agnostic tomography algorithm for product mixed states with polynomial sample and computational complexity achieving error $O(\mathrm{opt}\log(1/\mathrm{opt}))$, where $\mathrm{opt}$ is the trace distance to the best product approximation. This is the first efficient algorithm with any nontrivial agnostic guarantee for mixed-state ansatz, using only single-qubit, single-copy measurements. We also prove a Quantum Statistical Query lower bound showing near-optimality, and an unconditional lower bound demonstrating that adaptivity is necessary under single-qubit measurements. (2) We give a semi-agnostic algorithm for robustly learning binary product distributions with matching guarantees and establish a Statistical Query lower bound, essentially resolving the efficient robust learnability of this class and improving on prior work since Diakonikolas et al. (2016).
Publisher OA PDF DOI
Robustly Learning Monotone Generalized Linear Models via Data Augmentation
ArXiv.org · 2025-02-12
preprintOpen access
We study the task of learning Generalized Linear models (GLMs) in the agnostic model under the Gaussian distribution. We give the first polynomial-time algorithm that achieves a constant-factor approximation for \textit{any} monotone Lipschitz activation. Prior constant-factor GLM learners succeed for a substantially smaller class of activations. Our work resolves a well-known open problem, by developing a robust counterpart to the classical GLMtron algorithm (Kakade et al., 2011). Our robust learner applies more generally, encompassing all monotone activations with bounded $(2+ζ)$-moments, for any fixed $ζ>0$ -- a condition that is essentially necessary. To obtain our results, we leverage a novel data augmentation technique with decreasing Gaussian noise injection and prove a number of structural results that may be useful in other settings.
Publisher OA PDF DOI
Statistical Query Hardness of Multiclass Linear Classification with Random Classification Noise
ArXiv.org · 2025-02-17
preprintOpen access1st authorCorresponding
We study the task of Multiclass Linear Classification (MLC) in the distribution-free PAC model with Random Classification Noise (RCN). Specifically, the learner is given a set of labeled examples $(x, y)$, where $x$ is drawn from an unknown distribution on $R^d$ and the labels are generated by a multiclass linear classifier corrupted with RCN. That is, the label $y$ is flipped from $i$ to $j$ with probability $H_{ij}$ according to a known noise matrix $H$ with non-negative separation $σ: = \min_{i \neq j} H_{ii}-H_{ij}$. The goal is to compute a hypothesis with small 0-1 error. For the special case of two labels, prior work has given polynomial-time algorithms achieving the optimal error. Surprisingly, little is known about the complexity of this task even for three labels. As our main contribution, we show that the complexity of MLC with RCN becomes drastically different in the presence of three or more labels. Specifically, we prove super-polynomial Statistical Query (SQ) lower bounds for this problem. In more detail, even for three labels and constant separation, we give a super-polynomial lower bound on the complexity of any SQ algorithm achieving optimal error. For a larger number of labels and smaller separation, we show a super-polynomial SQ lower bound even for the weaker goal of achieving any constant factor approximation to the optimal loss or even beating the trivial hypothesis.
Publisher OA PDF DOI
Batch List-Decodable Linear Regression via Higher Moments
ArXiv.org · 2025-03-12
preprintOpen access1st authorCorresponding
We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $α\in (0, 1/2)$, an unknown $α$-fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in $\ell_2$-norm. [DJKS23] gave an efficient algorithm, under natural distributional assumptions, with the following guarantee. Assuming that the batch size $n$ satisfies $n \geq \tildeΩ(α^{-1})$ and the number of batches is $m = \mathrm{poly}(d, n, 1/α)$, their algorithm runs in polynomial time and outputs a list of $O(1/α^2)$ vectors at least one of which is $\tilde{O}(α^{-1/2}/\sqrt{n})$ close to the target regressor. Here we design a new polynomial time algorithm with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant $δ>0$, as long as the batch size is $n \geq Ω_δ(α^{-δ})$ and the degree-$Θ(1/δ)$ moments of the covariates are SoS certifiably bounded, our algorithm uses $m = \mathrm{poly}((dn)^{1/δ}, 1/α)$ batches, runs in polynomial-time, and outputs an $O(1/α)$-sized list of vectors one of which is $O(α^{-δ/2}/\sqrt{n})$ close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach uses higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.
Publisher OA PDF DOI
Robust Learning of Multi-index Models via Iterative Subspace Approximation
ArXiv.org · 2025-02-13
preprintOpen access1st authorCorresponding
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. A $K$-MIM is any function $f$ that only depends on a $K$-dimensional subspace. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. Our main contribution is a general robust learner that is qualitatively optimal in the Statistical Query (SQ) model. Our algorithm iteratively constructs better approximations to the defining subspace by computing low-degree moments conditional on the projection to the subspace computed thus far, and adding directions with relatively large empirical moments. This procedure efficiently finds a subspace $V$ so that $f(\mathbf{x})$ is close to a function of the projection of $\mathbf{x}$ onto $V$. Conversely, for functions for which these conditional moments do not help, we prove an SQ lower bound suggesting that no efficient learner exists. As applications, we provide faster robust learners for the following concept classes: * {\bf Multiclass Linear Classifiers} We give a constant-factor approximate agnostic learner with sample complexity $N = O(d) 2^{\mathrm{poly}(K/ε)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first constant-factor agnostic learner for this class whose complexity is a fixed-degree polynomial in $d$. * {\bf Intersections of Halfspaces} We give an approximate agnostic learner for this class achieving 0-1 error $K \tilde{O}(\mathrm{OPT}) + ε$ with sample complexity $N=O(d^2) 2^{\mathrm{poly}(K/ε)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first agnostic learner for this class with near-linear error dependence and complexity a fixed-degree polynomial in $d$. Furthermore, we show that in the presence of random classification noise, the complexity of our algorithm scales polynomially with $1/ε$.
Publisher OA PDF DOI
Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation
Society for Industrial and Applied Mathematics eBooks · 2025-01-01
book-chapter1st authorCorresponding
We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a k-component mixture distribution where each wi ≤ α for some known parameter α, and each Pi has unknown covariance for some unknown σi, the goal is to cluster the samples assuming a pairwise mean separation in the order of between every pair of components Pi and Pj. Our main contributions are as follows:
Publisher DOI
A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise
ArXiv.org · 2025-01-16
preprintOpen access1st authorCorresponding
We study the problem of PAC learning $γ$-margin halfspaces in the presence of Massart noise. Without computational considerations, the sample complexity of this learning problem is known to be $\widetildeΘ(1/(γ^2 ε))$. Prior computationally efficient algorithms for the problem incur sample complexity $\tilde{O}(1/(γ^4 ε^3))$ and achieve 0-1 error of $η+ε$, where $η<1/2$ is the upper bound on the noise rate. Recent work gave evidence of an information-computation tradeoff, suggesting that a quadratic dependence on $1/ε$ is required for computationally efficient algorithms. Our main result is a computationally efficient learner with sample complexity $\widetildeΘ(1/(γ^2 ε^2))$, nearly matching this lower bound. In addition, our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.
Publisher OA PDF DOI
Linear Regression under Missing or Corrupted Coordinates
ArXiv.org · 2025-09-23
preprintOpen access1st authorCorresponding
We study multivariate linear regression under Gaussian covariates in two settings, where data may be erased or corrupted by an adversary under a coordinate-wise budget. In the incomplete data setting, an adversary may inspect the dataset and delete entries in up to an $η$-fraction of samples per coordinate; a strong form of the Missing Not At Random model. In the corrupted data setting, the adversary instead replaces values arbitrarily, and the corruption locations are unknown to the learner. Despite substantial work on missing data, linear regression under such adversarial missingness remains poorly understood, even information-theoretically. Unlike the clean setting, where estimation error vanishes with more samples, here the optimal error remains a positive function of the problem parameters. Our main contribution is to characterize this error up to constant factors across essentially the entire parameter range. Specifically, we establish novel information-theoretic lower bounds on the achievable error that match the error of (computationally efficient) algorithms. A key implication is that, perhaps surprisingly, the optimal error in the missing data setting matches that in the corruption setting-so knowing the corruption locations offers no general advantage.
Publisher OA PDF DOI
SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More
2025-06-15 · 1 citations
article1st authorCorresponding
Publisher DOI

Recent grants

AitF: Collaborative Research: Fast, Accurate, and Practical: Adaptive Sublinear Algorithms for Scalable Visualization
NSF · $233k · 2019–2022
CAREER: Efficient Algorithms for Learning and Testing Structured Probabilistic Models
NSF · $315k · 2017–2020

Frequent coauthors

Daniel M. Kane
216 shared
Alistair Stewart
107 shared
Rocco A. Servedio
81 shared
Jerry Li
Pfizer (United States)
57 shared
Ankur Moitra
IIT@MIT
53 shared
Gautam Kamath
University of Waterloo
51 shared
Nikos Zarifis
39 shared
Anindya De
University of Pennsylvania
35 shared

Education

B.S.
National Technical University of Athens
Ph.D., Computer Science
Columbia University

Awards & honors

ACM Grace Murray Hopper Award
Guggenheim Fellowship
Sloan Fellowship
NSF CAREER Award
Marie Curie Fellowship

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ilias Diakonikolas

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you