Kevin Jamieson

· ProfessorVerified

University of Washington · Computer Science & Engineering

Active 2009–2026

h-index27

Citations4.2k

Papers14977 last 5y

Funding$500k

Faculty page Lab page

See your match with Kevin Jamieson — sign in to PhdFit.Sign in

About

Kevin Jamieson is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. He holds the title of Paul G. Allen Career Development Professor in Computer Science & Engineering and is also an Adjunct faculty member in the Department of Statistics. Jamieson received his B.S. in 2009 from the University of Washington, his M.S. in 2010 from Columbia University, and his Ph.D. in 2015 from the University of Wisconsin – Madison, all in electrical engineering. After completing a postdoctoral fellowship in the AMP lab at the University of California, Berkeley, working with Benjamin Recht, he returned to the University of Washington as faculty in 2017. His research focuses on leveraging already-collected data to inform future measurements in a closed-loop system, a process known as active learning. This approach allows for extracting richer insights than fixed measurement plans within the same statistical budget. Jamieson’s work spans from theoretical foundations to practical algorithms with guarantees, and includes open-source machine learning systems. His research has been applied in various domains such as measuring human perception in psychology studies, adaptive A/B/n testing in dynamic web environments, numerical optimization, and hyperparameter selection for deep neural networks. His contributions to the field have been recognized through awards including an NSF CAREER award and an Amazon Faculty Research award.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Algorithm
Mathematical optimization
Mathematics
Combinatorics
Discrete mathematics
Information Retrieval
World Wide Web
Engineering
Multimedia
Data science
Parallel computing
Statistics

Selected publications

Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback
arXiv (Cornell University) · 2026-02-24
articleOpen access
In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
Publisher OA PDF
A reaction-aware autonomous platform co-optimizing formulation and execution for the synthesis of zeolitic imidazolate framework
ChemRxiv · 2026-02-23
articleOpen access
Autonomous experimentation is transforming materials discovery, yet most platforms rely on endof-run measurements, ignoring reaction-progress information and imposing rigid execution protocols. Here, we present Axo, a reaction-aware batch autonomous platform that integrates timeresolved in situ optical monitoring into liquid-handling synthesis. Each run is recorded as a reaction trajectory and converted into two kinetic descriptors: the extent of reaction that adapts measurement interval and batch size to match the reaction timescale, and the nucleation rate constant (𝑘 𝑛 ) that guides the Bayesian optimization of formulation variables. Applied to the solution-phase crystallization of zeolitic imidazolate framework-67 (ZIF-67 , Axo identified a rapid-nucleation region with 𝑘 𝑛 = 2.33 min -1 , exceeding the best condition found by static maximin sampling by over five-fold. This work demonstrates how programmable in situ trajectories can be converted into actionable feedback that enables reaction-aware closed-loop optimization to cooptimize what to test and how to measure in solution-phase materials synthesis.
Publisher DOI
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
Open MIND · 2026-03-03
preprintSenior author
We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.
DOI
Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
Open MIND · 2026-02-25
preprint
We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.
DOI
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
arXiv (Cornell University) · 2026-03-11
articleOpen access
We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.
Publisher OA PDF
Efficient Uncoupled Learning Dynamics with $\tilde{O}\!\left(T^{-1/4}\right)$ Last-Iterate Convergence in Bilinear Saddle-Point Problems over Convex Sets under Bandit Feedback
Open MIND · 2026-02-24
preprint
In this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
DOI
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
arXiv (Cornell University) · 2026-03-11
preprintOpen access
We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.
Publisher DOI
Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
arXiv (Cornell University) · 2026-02-25
articleOpen access
We study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.
Publisher OA PDF
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
ArXiv.org · 2026-03-03
articleOpen accessSenior author
We study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.
Publisher OA PDF
High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical Systems
ArXiv.org · 2025-09-15
preprintOpen access
In this work, we consider the problem of identifying an unknown linear dynamical system given a finite hypothesis class. In particular, we analyze the effect of the excitation input on the sample complexity of identifying the true system with high probability. To this end, we present sample complexity lower bounds that capture the choice of the selected excitation input. The sample complexity lower bound gives rise to a system theoretic condition to determine the potential benefit of experiment design. Informed by the analysis of the sample complexity lower bound, we propose a persistent excitation (PE) condition tailored to the considered setting, which we then use to establish sample complexity upper bounds. Notably, the PE condition is weaker than in the case of an infinite hypothesis class and allows analyzing different excitation inputs modularly. Crucially, the lower and upper bounds share the same dependency on key problem parameters. Finally, we leverage these insights to propose an active learning algorithm that sequentially excites the system optimally with respect to the current estimate, and provide sample complexity guarantees for the presented algorithm. Concluding simulations showcase the effectiveness of the proposed algorithm.
Publisher OA PDF DOI

Recent grants

CIF: Small: Online Learning and Optimal Experiment Design with a Budget
NSF · $500k · 2020–2024

Frequent coauthors

Lalit Jain
33 shared
Robert D. Nowak
22 shared
Andrew Wagenmaker
22 shared
Max Simchowitz
19 shared
Maya R. Gupta
17 shared
Simon S. Du
16 shared
Hyrum S. Anderson
Robust Chip (United States)
16 shared
Julian Katz-Samuels
15 shared

Labs

Kevin Jamieson's LabPI
Not provided

Education

B.S., Electrical Engineering
University of Washington
2009
M.S., Electrical Engineering
Columbia University
2010
Ph.D., Electrical Engineering
University of Wisconsin - Madison
2015

Awards & honors

NSF CAREER award
Amazon Faculty Research award

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Kevin Jamieson

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you