Ronald Parr
· Professor of Computer ScienceDuke University · Computer Science
Active 1993–2025
About
Ronald Parr is a Professor of Computer Science at Duke University, where he has contributed extensively to the fields of machine learning, reinforcement learning, and decision processes. His research focuses on developing algorithms for Markov Decision Processes, reinforcement learning, and related areas, with particular attention to policy iteration, value function approximation, and the application of probabilistic models in robotics and autonomous systems. Parr has authored numerous influential papers, including work on non-myopic sensing, factored MDPs, and the integration of noise as a regularizer in decision-making models. Throughout his career, Parr has been recognized for his contributions to artificial intelligence and machine learning, including winning awards such as the IJCAI-JAIR best paper award. He has held leadership roles, such as department chair at Duke University, and has delivered multiple graduation speeches. His work has advanced the understanding of optimal strategies in stochastic environments and has contributed to the development of scalable algorithms for complex decision-making tasks in robotics, security games, and reinforcement learning. Parr's research is characterized by a rigorous approach to solving real-world problems through innovative computational methods and theoretical insights.
Research topics
- Computer Science
- Machine Learning
- Artificial Intelligence
- Mathematics
- Algorithm
- Mathematical optimization
- Engineering
- Mathematical analysis
- Theoretical computer science
- Mathematical economics
Selected publications
arXiv (Cornell University) · 2025-01-03
preprintOpen accessSenior authorIn off-policy policy evaluation (OPE) tasks within reinforcement learning, Temporal Difference Learning(TD) and Fitted Q-Iteration (FQI) have traditionally been viewed as differing in the number of updates toward the target value function: TD makes one update, FQI makes an infinite number, and Partial Fitted Q-Iteration (PFQI) performs a finite number. We show that this view is not accurate, and provide a new mathematical perspective under linear value function approximation that unifies these methods as a single iterative method solving the same linear system, but using different matrix splitting schemes and preconditioners. We show that increasing the number of updates under the same target value function, i.e., the target network technique, is a transition from using a constant preconditioner to using a data-feature adaptive preconditioner. This elucidates, for the first time, why TD convergence does not necessarily imply FQI convergence, and establishes tight convergence connections among TD, PFQI, and FQI. Our framework enables sharper theoretical results than previous work and characterization of the convergence conditions for each algorithm, without relying on assumptions about the features (e.g., linear independence). We also provide an encoder-decoder perspective to better understand the convergence conditions of TD, and prove, for the first time, that when a large learning rate doesn't work, trying a smaller one may help. Our framework also leads to the discovery of new crucial conditions on features for convergence, and shows how common assumptions about features influence convergence, e.g., the assumption of linearly independent features can be dropped without compromising the convergence guarantees of stochastic TD in the on-policy setting. This paper is also the first to introduce matrix splitting into the convergence analysis of these algorithms.
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
2024-01-01 · 2 citations
articleUsing Noise to Infer Aspects of Simplicity Without Learning
2024-01-01
articleAmazing Things Come From Having Many Good Models
arXiv (Cornell University) · 2024-07-05 · 6 citations
preprintOpen accessThe Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for tabular data problems in the nondeterministic (noisy) setting. We address how the Rashomon Effect impacts (1) the existence of simple-yet-accurate models, (2) flexibility to address user preferences, such as fairness and monotonicity, without losing performance, (3) uncertainty in predictions, fairness, and explanations, (4) reliable variable importance, (5) algorithm choice, specifically, providing advanced knowledge of which algorithms might be suitable for a given problem, and (6) public policy. We also discuss a theory of when the Rashomon Effect occurs and why. Our goal is to illustrate how the Rashomon Effect can have a massive impact on the use of machine learning for complex problems in society.
An Optimal Tightness Bound for the Simulation Lemma
arXiv (Cornell University) · 2024-06-24
preprintOpen accessSenior authorWe present a bound for value-prediction error with respect to model misspecification that is tight, including constant factors. This is a direct improvement of the "simulation lemma," a foundational result in reinforcement learning. We demonstrate that existing bounds are quite loose, becoming vacuous for large discount factors, due to the suboptimal treatment of compounding probability errors. By carefully considering this quantity on its own, instead of as a subcomponent of value error, we derive a bound that is sub-linear with respect to transition function misspecification. We then demonstrate broader applicability of this technique, improving a similar bound in the related subfield of hierarchical abstraction.
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
arXiv (Cornell University) · 2024-07-10
preprintOpen accessReinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to -- or knowledge of -- an underlying, unobservable state space. Our metric, the $λ$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($λ$) with a different value of $λ$. Since TD($λ{=}0$) makes an implicit Markov assumption and TD($λ{=}1$) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $λ$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $λ$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $λ$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.
A Path to Simpler Models Starts With Noise
PubMed · 2023-10-30 · 1 citations
preprintOpen accessThe Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.
A Path to Simpler Models Starts With Noise
2023-01-01
articleOn the Existence of Simpler Machine Learning Models
2022 ACM Conference on Fairness, Accountability, and Transparency · 2022 · 66 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Artificial Intelligence
It is almost always easier to find an accurate-but-complex model than an accurate-yet-simple model. Finding optimal, sparse, accurate models of various forms (linear models with integer coefficients, decision sets, rule lists, decision trees) is generally NP-hard. We often do not know whether the search for a simpler model will be worthwhile, and thus we do not go to the trouble of searching for one. In this work, we ask an important practical question: can accurate-yet-simple models be proven to exist, or shown likely to exist, before explicitly searching for them? We hypothesize that there is an important reason that simple-yet-accurate models often do exist. This hypothesis is that the size of the Rashomon set is often large, where the Rashomon set is the set of almost-equally-accurate models from a function class. If the Rashomon set is large, it contains numerous accurate models, and perhaps at least one of them is the simple model we desire. In this work, we formally present the Rashomon ratio as a new gauge of simplicity for a learning problem, depending on a function class and a data set. The Rashomon ratio is the ratio of the volume of the set of accurate models to the volume of the hypothesis space, and it is different from standard complexity measures from statistical learning theory. Insight from studying the Rashomon ratio provides an easy way to check whether a simpler model might exist for a problem before finding it, namely whether several different machine learning methods achieve similar performance on the data. In that sense, the Rashomon ratio is a powerful tool for understanding why and when an accurate-yet-simple model might exist. If, as we hypothesize in this work, many real-world data sets admit large Rashomon sets, the implications are vast: it means that simple or interpretable models may often be used for high-stakes decisions without losing accuracy.
Computing Optimal Strategies to Commit to in Stochastic Games
Proceedings of the AAAI Conference on Artificial Intelligence · 2021 · 35 citations
- Computer Science
- Computer Science
- Mathematical economics
Significant progress has been made recently in the following two lines of research in the intersection of AI and game theory: (1) the computation of optimal strategies to commit to (Stackelberg strategies), and (2) the computation of correlated equilibria of stochastic games. In this paper, we unite these two lines of research by studying the computation of Stackelberg strategies in stochastic games. We provide theoretical results on the value of being able to commit and the value of being able to correlate, as well as complexity results about computing Stackelberg strategies in stochastic games. We then modify the QPACE algorithm (MacDermed et al. 2011) to compute Stackelberg strategies, and provide experimental results.
Recent grants
EAGER: IIS: RI: Learning in Continuous and High Dimensional Action Spaces
NSF · $150k · 2011–2013
RI: Small: Non-parametric Approximate Dynamic Programming for Continuous Domains
NSF · $458k · 2012–2018
Collaborative: RI: Feature Discovery and Benchmarks for Exportable Reinforcement Learning
NSF · $225k · 2007–2011
CAREER: Observing to Plan - Planning to Observe
NSF · $446k · 2006–2013
Frequent coauthors
- 17 shared
Daphne Koller
- 9 shared
Carlos Guestrin
- 9 shared
Michael L. Littman
- 9 shared
Michail G. Lagoudakis
Technical University of Crete
- 8 shared
Christopher Painter-Wakefield
Duke University
- 6 shared
Lawrence Carin
King Abdullah University of Science and Technology
- 6 shared
Vincent Conitzer
Carnegie Mellon University
- 5 shared
Jason Pazis
Amazon (United States)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ronald Parr
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup