Nicholas Bambos

· Richard W. Weiland Professor in the School of Engineering and Professor of Electrical EngineeringVerified

Stanford University · Management Science and Engineering

Active 1989–2026

h-index36

Citations5.5k

Papers38451 last 5y

Funding—

Faculty page

See your match with Nicholas Bambos — sign in to PhdFit.Sign in

About

Nicholas Bambos is the Richard W. Weiland Professor in the School of Engineering at Stanford University, with a joint appointment in the Department of Electrical Engineering and the Department of Management Science & Engineering. He has served as the Fortinet Founders Department Chair of the Management Science & Engineering Department from 2016 to 2020. He heads the Computer Systems Performance Engineering Lab (Perf-Lab) at Stanford, which involves doctoral students and industry visitors engaged in various research projects. Bambos was also the Director of the Stanford Networking Research Center from 1999 to 2005, overseeing a research initiative of about $30 million. His research interests encompass architecture and high-performance engineering of computer systems and networks, as well as data analytics with a focus on medical and healthcare analytics. His contributions span networking and the Internet, cloud computing, data centers, multimedia streaming, computer security, and digital health. His methodological expertise includes network control, online task scheduling, routing, distributed processing, machine learning, and artificial intelligence. Bambos earned his Ph.D. in Electrical Engineering & Computer Sciences from the University of California at Berkeley in 1989. Prior to Stanford, he was an assistant and then tenured associate professor at UCLA. He has published over 300 peer-reviewed research publications and graduated more than 40 doctoral students who have moved into leadership roles across academia, industry, finance, and startups. Bambos has received numerous awards, including best research paper awards, the Cisco Systems Faculty Development Chair, the David Morgenthaler Faculty Scholar, the IBM Faculty Award, and the National Young Investigator Award from the NSF. He has served on editorial boards, scientific committees, and technical review panels, and has been involved as a consultant, co-founder of startups, and expert witness in legal cases involving information technologies.

Research topics

Computer Science
Machine Learning
Mathematics
Geometry
Chemistry
Engineering
Internal medicine
Applied mathematics
Mathematical analysis
Mathematical optimization
Surgery
Medicine
Process engineering
Chromatography

Selected publications

High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
arXiv (Cornell University) · 2026-03-15
preprintOpen accessSenior author
We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.
Publisher DOI
Last-Iterate Guarantees for Learning in Co-coercive Games
arXiv (Cornell University) · 2026-04-21
preprintOpen accessSenior author
We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.
Publisher DOI
Last-Iterate Guarantees for Learning in Co-coercive Games
ArXiv.org · 2026-04-21
articleOpen accessSenior author
We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.
Publisher OA PDF
Social Federated Learning (SFL): Leveraging Shared Data to Boost Learning Performance
SSRN Electronic Journal · 2026-01-01
preprintOpen access
Publisher DOI
Regret and Sample Complexity of Online Q-Learning via Concentration of Stochastic Approximation with Time-Inhomogeneous Markov Chains
Open MIND · 2026-02-18
preprintSenior author
We present the first regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes (MDPs), without relying on optimism or bonus terms. We first analyze Boltzmann Q-learning with decaying temperature and show that its regret depends critically on the suboptimality gap of the MDP: for sufficiently large gaps, the regret is sublinear, while for small gaps it deteriorates and can approach linear growth. To address this limitation, we study a Smoothed $ε_n$-Greedy exploration scheme that combines $ε_n$-greedy and Boltzmann exploration, for which we prove a gap-robust regret bound of near-$\tilde{O}(N^{9/10})$. We also obtain sample complexity guarantees, with both regret and sample complexity bounds holding with high probability. To analyze these algorithms, we develop a high-probability concentration bound for contractive Markovian stochastic approximation with iterate- and time-dependent transition dynamics. This bound may be of independent interest as the contraction factor in our framework is allowed to converge to one asymptotically.
DOI
Regret and Sample Complexity of Online Q-Learning via Concentration of Stochastic Approximation with Time-Inhomogeneous Markov Chains
ArXiv.org · 2026-02-18
articleOpen accessSenior author
We present the first regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes (MDPs), without relying on optimism or bonus terms. We first analyze Boltzmann Q-learning with decaying temperature and show that its regret depends critically on the suboptimality gap of the MDP: for sufficiently large gaps, the regret is sublinear, while for small gaps it deteriorates and can approach linear growth. To address this limitation, we study a Smoothed $ε_n$-Greedy exploration scheme that combines $ε_n$-greedy and Boltzmann exploration, for which we prove a gap-robust regret bound of near-$\tilde{O}(N^{9/10})$. We also obtain sample complexity guarantees, with both regret and sample complexity bounds holding with high probability. To analyze these algorithms, we develop a high-probability concentration bound for contractive Markovian stochastic approximation with iterate- and time-dependent transition dynamics. This bound may be of independent interest as the contraction factor in our framework is allowed to converge to one asymptotically.
Publisher OA PDF
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
ArXiv.org · 2026-03-20
articleOpen accessSenior author
Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.
Publisher OA PDF
High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
ArXiv.org · 2026-03-15
articleOpen accessSenior author
We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.
Publisher OA PDF
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
arXiv (Cornell University) · 2026-03-20
preprintOpen accessSenior author
Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.
Publisher DOI
Policy Gradient Methods for Non-Markovian Reinforcement Learning
arXiv (Cornell University) · 2026-05-11
preprintOpen accessSenior author
We study policy gradient methods for reinforcement learning in non-Markovian decision processes (NMDPs), where observations and rewards depend on the entire interaction history. To handle this dependence, the agent maintains an internal state that is recursively updated to provide a compact summary of past observations and actions. In contrast to approaches that treat the agent state dynamics as fixed or learn it via predictive objectives, we propose a reward-centric formulation that jointly optimizes the agent state dynamics and the control policy to maximize the expected cumulative reward. To this end, we consider a class of Agent State-Markov (ASM) policies, comprising an agent state dynamics and a control policy that maps the agent state to actions. We establish a novel policy gradient theorem for ASM policies, extending the classical policy gradient results from the Markovian setting to episodic and infinite-horizon discounted NMDPs. Building on this gradient expression, we propose the Agent State-Markov Policy Gradient (ASMPG) algorithm, which leverages the recursive structure of the agent state dynamics for efficient optimization. We establish finite-time and almost sure convergence guarantees, and empirically demonstrate that, on a range of non-Markovian tasks, ASMPG outperforms baselines that learn state representations via predictive objectives.
Publisher DOI

Frequent coauthors

Zhengyuan Zhou
Fu Wai Hospital
56 shared
Panayotis Mertikopoulos
Laboratoire d'Informatique de Grenoble
45 shared
Neal Master
35 shared
Ilai Bistritz
Tel Aviv University
27 shared
Aditya Dua
Proteus Digital Health
24 shared
Carri W. Chan
24 shared
Peter W. Glynn
24 shared
Daniel Miller
Icahn School of Medicine at Mount Sinai
22 shared

Education

Ph.D., Electrical Engineering
Stanford University
1990
M.S., Electrical Engineering
Stanford University
1985
B.S., Electrical Engineering and Computer Science
University of California, Berkeley
1983

Awards & honors

Cisco Systems Faculty Development Chair
David Morgenthaler Faculty Scholar
IBM Faculty Award
National Young Investigator Award
Research Initiation Award from the National Science Foundati…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Nicholas Bambos

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you