Kevin Jamieson
· ProfessorVerifiedUniversity of Washington · Computer Science & Engineering
Active 2009–2026
About
Kevin Jamieson is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. He holds the title of Paul G. Allen Career Development Professor in Computer Science & Engineering and is also an Adjunct faculty member in the Department of Statistics. Jamieson received his B.S. in 2009 from the University of Washington, his M.S. in 2010 from Columbia University, and his Ph.D. in 2015 from the University of Wisconsin – Madison, all in electrical engineering. After completing a postdoctoral fellowship in the AMP lab at the University of California, Berkeley, working with Benjamin Recht, he returned to the University of Washington as faculty in 2017. His research focuses on leveraging already-collected data to inform future measurements in a closed-loop system, a process known as active learning. This approach allows for extracting richer insights than fixed measurement plans within the same statistical budget. Jamieson’s work spans from theoretical foundations to practical algorithms with guarantees, and includes open-source machine learning systems. His research has been applied in various domains such as measuring human perception in psychology studies, adaptive A/B/n testing in dynamic web environments, numerical optimization, and hyperparameter selection for deep neural networks. His contributions to the field have been recognized through awards including an NSF CAREER award and an Amazon Faculty Research award.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Algorithm
- Mathematical optimization
- Mathematics
- Combinatorics
- Discrete mathematics
- Information Retrieval
- World Wide Web
- Engineering
- Multimedia
- Data science
- Parallel computing
- Statistics
Selected publications
arXiv (Cornell University) · 2026-02-24
articleOpen accessIn this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
ChemRxiv · 2026-02-23
articleOpen accessAutonomous experimentation is transforming materials discovery, yet most platforms rely on endof-run measurements, ignoring reaction-progress information and imposing rigid execution protocols. Here, we present Axo, a reaction-aware batch autonomous platform that integrates timeresolved in situ optical monitoring into liquid-handling synthesis. Each run is recorded as a reaction trajectory and converted into two kinetic descriptors: the extent of reaction that adapts measurement interval and batch size to match the reaction timescale, and the nucleation rate constant (𝑘 𝑛 ) that guides the Bayesian optimization of formulation variables. Applied to the solution-phase crystallization of zeolitic imidazolate framework-67 (ZIF-67 , Axo identified a rapid-nucleation region with 𝑘 𝑛 = 2.33 min -1 , exceeding the best condition found by static maximin sampling by over five-fold. This work demonstrates how programmable in situ trajectories can be converted into actionable feedback that enables reaction-aware closed-loop optimization to cooptimize what to test and how to measure in solution-phase materials synthesis.
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
Open MIND · 2026-03-03
preprintSenior authorWe study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.
Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
Open MIND · 2026-02-25
preprintWe study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
arXiv (Cornell University) · 2026-03-11
articleOpen accessWe study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.
Open MIND · 2026-02-24
preprintIn this paper, we study last-iterate convergence of learning algorithms in bilinear saddle-point problems, a preferable notion of convergence that captures the day-to-day behavior of learning dynamics. We focus on the challenging setting where players select actions from compact convex sets and receive only bandit feedback. Our main contribution is the design of an uncoupled learning algorithm that guarantees last-iterate convergence to the Nash equilibrium with high probability. We establish a convergence rate of $\tilde{O}(T^{-1/4})$ up to polynomial factors in problem parameters. Crucially, our proposed algorithm is computationally efficient, requiring only an efficient linear optimization oracle over the players' compact action sets. The algorithm is obtained by combining techniques from experimental design and the classic Follow-The-Regularized-Leader (FTRL) framework, with a carefully chosen regularizer function tailored to the geometry of the action set of each learner.
On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits
arXiv (Cornell University) · 2026-03-11
preprintOpen accessWe study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.
Revisiting the Bertrand Paradox via Equilibrium Analysis of No-regret Learners
arXiv (Cornell University) · 2026-02-25
articleOpen accessWe study the discrete Bertrand pricing game with a non-increasing demand function. The game has $n \ge 2$ players who simultaneously choose prices from the set $\{1/k, 2/k, \ldots, 1\}$, where $k\in\mathbb{N}$. The player who sets the lowest price captures the entire demand; if multiple players tie for the lowest price, they split the demand equally. We study the Bertrand paradox, where classical theory predicts low prices, yet real markets often sustain high prices. To understand this gap, we analyze a repeated-game model in which firms set prices using no-regret learners. Our goal is to characterize the equilibrium outcomes that can arise under different no-regret learning guarantees. We are particularly interested in questions such as whether no-external-regret learners can converge to undesirable high-price outcomes, and how stronger guarantees such as no-swap regret shape the emergence of competitive low-price behavior. We address these and related questions through a theoretical analysis, complemented by experiments that support the theory and reveal surprising phenomena for no-swap regret learners.
Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning
ArXiv.org · 2026-03-03
articleOpen accessSenior authorWe study reinforcement learning with delayed state observation, where the agent observes the current state after some random number of time steps. We propose an algorithm that combines the augmentation method and the upper confidence bound approach. For tabular Markov decision processes (MDPs), we derive a regret bound of $\tilde{\mathcal{O}}(H \sqrt{D_{\max} SAK})$, where $S$ and $A$ are the cardinalities of the state and action spaces, $H$ is the time horizon, $K$ is the number of episodes, and $D_{\max}$ is the maximum length of the delay. We also provide a matching lower bound up to logarithmic factors, showing the optimality of our approach. Our analytical framework formulates this problem as a special case of a broader class of MDPs, where their transition dynamics decompose into a known component and an unknown but structured component. We establish general results for this abstract setting, which may be of independent interest.
High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical Systems
ArXiv.org · 2025-09-15
preprintOpen accessIn this work, we consider the problem of identifying an unknown linear dynamical system given a finite hypothesis class. In particular, we analyze the effect of the excitation input on the sample complexity of identifying the true system with high probability. To this end, we present sample complexity lower bounds that capture the choice of the selected excitation input. The sample complexity lower bound gives rise to a system theoretic condition to determine the potential benefit of experiment design. Informed by the analysis of the sample complexity lower bound, we propose a persistent excitation (PE) condition tailored to the considered setting, which we then use to establish sample complexity upper bounds. Notably, the PE condition is weaker than in the case of an infinite hypothesis class and allows analyzing different excitation inputs modularly. Crucially, the lower and upper bounds share the same dependency on key problem parameters. Finally, we leverage these insights to propose an active learning algorithm that sequentially excites the system optimally with respect to the current estimate, and provide sample complexity guarantees for the presented algorithm. Concluding simulations showcase the effectiveness of the proposed algorithm.
Recent grants
CIF: Small: Online Learning and Optimal Experiment Design with a Budget
NSF · $500k · 2020–2024
Frequent coauthors
- 33 shared
Lalit Jain
- 22 shared
Robert D. Nowak
- 22 shared
Andrew Wagenmaker
- 19 shared
Max Simchowitz
- 17 shared
Maya R. Gupta
- 16 shared
Simon S. Du
- 16 shared
Hyrum S. Anderson
Robust Chip (United States)
- 15 shared
Julian Katz-Samuels
Labs
Not provided
Education
- 2009
B.S., Electrical Engineering
University of Washington
- 2010
M.S., Electrical Engineering
Columbia University
- 2015
Ph.D., Electrical Engineering
University of Wisconsin - Madison
Awards & honors
- NSF CAREER award
- Amazon Faculty Research award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Kevin Jamieson
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup