Zaid Harchaoui

· ProfessorVerified

University of Washington · Statistics

Active 2004–2026

h-index62

Citations15.9k

Papers285101 last 5y

Funding$600k

Faculty page

See your match with Zaid Harchaoui — sign in to PhdFit.Sign in

About

Zaid Harchaoui is a professor at the University of Washington in the Department of Statistics. His research focuses on statistical and machine learning methods, with recognition from various professional organizations including the International Statistical Institute and the Neural Information Processing Systems (NeurIPS). He has received multiple awards for his contributions to the field, such as the Criteo Faculty Research Award and the Google Faculty Research Award. Harchaoui is also an elected member of the ISI and serves as a council member of the French Machine Learning Society. His academic and professional achievements highlight his significant role in advancing statistical and machine learning research.

Research topics

Artificial Intelligence
Computer Science
Natural Language Processing
Data Mining
Philosophy
Linguistics

Selected publications

Stochastic optimization on matrices and a graphon McKean–Vlasov limit
The Annals of Applied Probability · 2026-02-01
article1st authorCorresponding
Publisher DOI
Langevin diffusion approximation to same marginal Schrödinger bridge
Journal of Functional Analysis · 2026-03-31
article
Publisher DOI
A Generalization Theory for Zero-Shot Prediction
ArXiv.org · 2025-07-12
preprintOpen accessSenior author
A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be used for prediction on a downstream task for which no labeled data is available. We present a theoretical framework to better understand this approach, called zero-shot prediction. We identify the target quantities that zero-shot prediction aims to learn, or learns in passing, and the key conditional independence relationships that enable its generalization ability.
Publisher OA PDF DOI
Supervised Stochastic Gradient Algorithms for Multi-Trial Source Separation
ArXiv.org · 2025-08-28
preprintOpen accessSenior author
We develop a stochastic algorithm for independent component analysis that incorporates multi-trial supervision, which is available in many scientific contexts. The method blends a proximal gradient-type algorithm in the space of invertible matrices with joint learning of a prediction model through backpropagation. We illustrate the proposed algorithm on synthetic and real data experiments. In particular, owing to the additional supervision, we observe an increased success rate of the non-convex optimization and the improved interpretability of the independent components.
Publisher OA PDF DOI
Leveraging machine learning and accelerometry to classify animal behaviours with uncertainty
Methods in Ecology and Evolution · 2025-12-08 · 2 citations
articleOpen accessSenior author
Abstract Animal‐worn sensors have revolutionised the study of animal behaviour and ecology. Accelerometers, which measure changes in acceleration across planes of movement, are increasingly being used in conjunction with machine learning models to classify animal behaviours across taxa and research questions. However, the widespread adoption of these methods faces challenges from imbalanced training data, unquantified uncertainties in model outputs, shifts in model performance across contexts and noisy classifications in continuous data streams, where predicted behaviours change abruptly within a sequence. To address these challenges, we introduce an open‐source approach for classifying animal behaviour from raw acceleration data. Our approach integrates machine learning and statistical inference techniques to evaluate and mitigate class imbalances, changes in model performance across ecological settings and noisy classifications. Importantly, we extend predictions from single behaviour classifications to prediction sets: sets of behaviour labels guaranteed to contain the true behaviour with a pre‐specified probability, in a framework analogous to the use of prediction intervals in statistical analyses. We evaluate our approach via simulation and highlight its utility using data collected from a free‐ranging large carnivore, African wild dogs ( Lycaon pictus ), in the Okavango Delta, Botswana. We demonstrate significantly improved predictions along with associated uncertainty metrics in African wild dog behaviour classification, particularly for rare and ecologically important behaviours such as feeding, where correct classifications more than doubled following quality checks and data rebalancing introduced in our pipeline. Our approach is applicable across taxa and represents a key step towards advancing the burgeoning use of machine learning to remotely observe around‐the‐clock behaviours of free‐ranging animals. Future work could include the integration of multiple data streams, such as accelerometer, audio and GPS data, for model training and could be incorporated directly into our pipeline.
Publisher DOI
Min-Max Optimization with Dual-Linear Coupling
ArXiv.org · 2025-07-08
preprintOpen accessSenior author
We study a class of convex-concave min-max problems in which the coupled component of the objective is linear in at least one of the two decision vectors. We identify such problem structure as interpolating between the bilinearly and nonbilinearly coupled problems, motivated by key applications in areas such as distributionally robust optimization and convex optimization with functional constraints. Leveraging the considered nonlinear-linear coupling of the primal and the dual decision vectors, we develop a general algorithmic framework leading to fine-grained complexity bounds exploiting separability properties of the problem, whenever present. The obtained complexity bounds offer potential improvements over state-of-the-art scaling with $\sqrt{n}$ or $n$ in some of the considered problem settings, which even include bilinearly coupled problems, where $n$ is the dimension of the dual decision vector. On the algorithmic front, our work provides novel strategies for combining randomization with extrapolation and multi-point anchoring in the mirror descent-style updates in the primal and the dual, which we hope will find further applications in addressing related optimization problems. %
Publisher OA PDF DOI
Generative AI as a tool to accelerate the field of ecology
Nature Ecology & Evolution · 2025-01-29 · 29 citations
review
Publisher DOI
Langevin Diffusion Approximation to Same Marginal Schrödinger Bridge
arXiv (Cornell University) · 2025-05-12
preprintOpen access
We introduce a novel approximation to the same marginal Schrödinger bridge using the Langevin diffusion. As $\varepsilon \downarrow 0$, it is known that the barycentric projection (also known as the entropic Brenier map) of the Schrödinger bridge converges to the Brenier map, which is the identity. Our diffusion approximation is leveraged to show that, under suitable assumptions, the difference between the two is $\varepsilon$ times the gradient of the marginal log density (i.e., the score function), in $\mathbf{L}^2$. More generally, we show that the family of Markov operators, indexed by $\varepsilon > 0$, derived from integrating test functions against the conditional density of the static Schrödinger bridge at temperature $\varepsilon$, admits a derivative at $\varepsilon=0$ given by the generator of the Langevin semigroup. Hence, these operators satisfy an approximate semigroup property at low temperatures.
Publisher OA PDF DOI
Author response for "Leveraging machine learning and accelerometry to classify animal behaviours with uncertainty"
2025-07-03
peer-reviewSenior author
Publisher DOI
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
arXiv (Cornell University) · 2024-06-24 · 2 citations
preprintOpen accessSenior author
One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.
Publisher OA PDF DOI

Recent grants

TRIPODS+X:RES: Safe Imitation Learning for Robotics
NSF · $600k · 2018–2022

Frequent coauthors

Cordelia Schmid
131 shared
Julien Mairal
66 shared
Jérôme Revaud
54 shared
Vincent Roulet
45 shared
Yury Maximov
Los Alamos National Laboratory
41 shared
Massih-Reza Amini
40 shared
Philippe Weinzaepfel
37 shared
Jérôme Malick
32 shared

Awards & honors

Council Member, French Machine Learning Society (2022)
Criteo Faculty Research Award, Criteo AI Lab (2017)
Elected Member, International Statistical Institute (ISI) (2…
Google Faculty Research Award, Google (2018)
Outstanding Paper Award, Neural Information Processing Syste…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Zaid Harchaoui

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you