
Nicholas Bambos
· Richard W. Weiland Professor in the School of Engineering and Professor of Electrical EngineeringVerifiedStanford University · Management Science and Engineering
Active 1989–2026
About
Nicholas Bambos is the Richard W. Weiland Professor in the School of Engineering at Stanford University, with a joint appointment in the Department of Electrical Engineering and the Department of Management Science & Engineering. He has served as the Fortinet Founders Department Chair of the Management Science & Engineering Department from 2016 to 2020. He heads the Computer Systems Performance Engineering Lab (Perf-Lab) at Stanford, which involves doctoral students and industry visitors engaged in various research projects. Bambos was also the Director of the Stanford Networking Research Center from 1999 to 2005, overseeing a research initiative of about $30 million. His research interests encompass architecture and high-performance engineering of computer systems and networks, as well as data analytics with a focus on medical and healthcare analytics. His contributions span networking and the Internet, cloud computing, data centers, multimedia streaming, computer security, and digital health. His methodological expertise includes network control, online task scheduling, routing, distributed processing, machine learning, and artificial intelligence. Bambos earned his Ph.D. in Electrical Engineering & Computer Sciences from the University of California at Berkeley in 1989. Prior to Stanford, he was an assistant and then tenured associate professor at UCLA. He has published over 300 peer-reviewed research publications and graduated more than 40 doctoral students who have moved into leadership roles across academia, industry, finance, and startups. Bambos has received numerous awards, including best research paper awards, the Cisco Systems Faculty Development Chair, the David Morgenthaler Faculty Scholar, the IBM Faculty Award, and the National Young Investigator Award from the NSF. He has served on editorial boards, scientific committees, and technical review panels, and has been involved as a consultant, co-founder of startups, and expert witness in legal cases involving information technologies.
Research topics
- Computer Science
- Machine Learning
- Mathematics
- Geometry
- Chemistry
- Engineering
- Internal medicine
- Applied mathematics
- Mathematical analysis
- Mathematical optimization
- Surgery
- Medicine
- Process engineering
- Chromatography
Selected publications
High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
arXiv (Cornell University) · 2026-03-15
preprintOpen accessSenior authorWe present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.
Last-Iterate Guarantees for Learning in Co-coercive Games
arXiv (Cornell University) · 2026-04-21
preprintOpen accessSenior authorWe establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.
Last-Iterate Guarantees for Learning in Co-coercive Games
ArXiv.org · 2026-04-21
articleOpen accessSenior authorWe establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.
Social Federated Learning (SFL): Leveraging Shared Data to Boost Learning Performance
SSRN Electronic Journal · 2026-01-01
preprintOpen accessOpen MIND · 2026-02-18
preprintSenior authorWe present the first regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes (MDPs), without relying on optimism or bonus terms. We first analyze Boltzmann Q-learning with decaying temperature and show that its regret depends critically on the suboptimality gap of the MDP: for sufficiently large gaps, the regret is sublinear, while for small gaps it deteriorates and can approach linear growth. To address this limitation, we study a Smoothed $ε_n$-Greedy exploration scheme that combines $ε_n$-greedy and Boltzmann exploration, for which we prove a gap-robust regret bound of near-$\tilde{O}(N^{9/10})$. We also obtain sample complexity guarantees, with both regret and sample complexity bounds holding with high probability. To analyze these algorithms, we develop a high-probability concentration bound for contractive Markovian stochastic approximation with iterate- and time-dependent transition dynamics. This bound may be of independent interest as the contraction factor in our framework is allowed to converge to one asymptotically.
ArXiv.org · 2026-02-18
articleOpen accessSenior authorWe present the first regret bound for classical online Q-learning in infinite-horizon discounted Markov decision processes (MDPs), without relying on optimism or bonus terms. We first analyze Boltzmann Q-learning with decaying temperature and show that its regret depends critically on the suboptimality gap of the MDP: for sufficiently large gaps, the regret is sublinear, while for small gaps it deteriorates and can approach linear growth. To address this limitation, we study a Smoothed $ε_n$-Greedy exploration scheme that combines $ε_n$-greedy and Boltzmann exploration, for which we prove a gap-robust regret bound of near-$\tilde{O}(N^{9/10})$. We also obtain sample complexity guarantees, with both regret and sample complexity bounds holding with high probability. To analyze these algorithms, we develop a high-probability concentration bound for contractive Markovian stochastic approximation with iterate- and time-dependent transition dynamics. This bound may be of independent interest as the contraction factor in our framework is allowed to converge to one asymptotically.
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
ArXiv.org · 2026-03-20
articleOpen accessSenior authorStochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.
High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise
ArXiv.org · 2026-03-15
articleOpen accessSenior authorWe present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
arXiv (Cornell University) · 2026-03-20
preprintOpen accessSenior authorStochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including finance and communications, frequently encounter heavy-tailed and long-range dependent (LRD) noise. In this work, we study SA for finding the root of a strongly monotone operator under these non-classical noise models. We establish the first finite-time moment bounds in both settings, providing explicit convergence rates that quantify the impact of heavy tails and temporal dependence. Our analysis employs a noise-averaging argument that regularizes the impact of noise without modifying the iteration. Finally, we apply our general framework to stochastic gradient descent (SGD) and gradient play, and corroborate our finite-time analysis through numerical experiments.
Policy Gradient Methods for Non-Markovian Reinforcement Learning
arXiv (Cornell University) · 2026-05-11
preprintOpen accessSenior authorWe study policy gradient methods for reinforcement learning in non-Markovian decision processes (NMDPs), where observations and rewards depend on the entire interaction history. To handle this dependence, the agent maintains an internal state that is recursively updated to provide a compact summary of past observations and actions. In contrast to approaches that treat the agent state dynamics as fixed or learn it via predictive objectives, we propose a reward-centric formulation that jointly optimizes the agent state dynamics and the control policy to maximize the expected cumulative reward. To this end, we consider a class of Agent State-Markov (ASM) policies, comprising an agent state dynamics and a control policy that maps the agent state to actions. We establish a novel policy gradient theorem for ASM policies, extending the classical policy gradient results from the Markovian setting to episodic and infinite-horizon discounted NMDPs. Building on this gradient expression, we propose the Agent State-Markov Policy Gradient (ASMPG) algorithm, which leverages the recursive structure of the agent state dynamics for efficient optimization. We establish finite-time and almost sure convergence guarantees, and empirically demonstrate that, on a range of non-Markovian tasks, ASMPG outperforms baselines that learn state representations via predictive objectives.
Frequent coauthors
- 56 shared
Zhengyuan Zhou
Fu Wai Hospital
- 45 shared
Panayotis Mertikopoulos
Laboratoire d'Informatique de Grenoble
- 35 shared
Neal Master
- 27 shared
Ilai Bistritz
Tel Aviv University
- 24 shared
Aditya Dua
Proteus Digital Health
- 24 shared
Carri W. Chan
- 24 shared
Peter W. Glynn
- 22 shared
Daniel Miller
Icahn School of Medicine at Mount Sinai
Education
- 1990
Ph.D., Electrical Engineering
Stanford University
- 1985
M.S., Electrical Engineering
Stanford University
- 1983
B.S., Electrical Engineering and Computer Science
University of California, Berkeley
Awards & honors
- Cisco Systems Faculty Development Chair
- David Morgenthaler Faculty Scholar
- IBM Faculty Award
- National Young Investigator Award
- Research Initiation Award from the National Science Foundati…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Nicholas Bambos
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup