George D Konidaris
· Associate Professor of Computer ScienceVerifiedBrown University · Computer Science
Active 2002–2025
About
George D. Konidaris is an Associate Professor of Computer Science at Brown University and the director of the Intelligent Robot Lab, which is part of Brown's Integrative, General AI initiative (bigAI). His research is driven by the overarching scientific goal of understanding the fundamental computational processes that generate intelligence and using this understanding to design generally intelligent robots. Konidaris focuses on building intelligent, autonomous, general-purpose robots capable of performing a wide variety of tasks and operating in diverse environments. His work centers on designing agents that learn abstraction hierarchies to enable fast, goal-oriented planning. He develops and applies techniques from machine learning, reinforcement learning, optimal control, and planning to construct well-grounded hierarchies that facilitate rapid planning in common cases while maintaining robustness to uncertainty at every level of control. He emphasizes that solving the AI problem requires advances in all these areas as well as in their integration. In addition to his academic role, Konidaris is a co-founder of two technology startups. Realtime Robotics commercializes research on robot motion planning to simplify and improve robotic automation, while Lelapa AI is a commercial AI research lab based in Johannesburg, South Africa, focused on technology developed by and for Africans. Konidaris lives in Providence, Rhode Island, where he continues to lead research efforts aimed at advancing the field of intelligent robotics through both theoretical and applied contributions.
Research topics
- Artificial Intelligence
- Computer Science
- Machine Learning
- Computer Security
- Computer vision
- Engineering
- Human–computer interaction
- Mathematics
- Medicine
- Mathematical analysis
- Algorithm
- Biomedical engineering
- Simulation
- Theoretical computer science
- Mathematical optimization
- Ophthalmology
- Surgery
Selected publications
Visual Theory of Mind Enables the Invention of Proto-Writing
ArXiv.org · 2025-02-03
preprintOpen accessSenior authorSymbolic writing systems are graphical semiotic codes that are ubiquitous in modern society but are otherwise absent in the animal kingdom. Anthropological evidence suggests that the earliest forms of some writing systems originally consisted of iconic pictographs, which signify their referent via visual resemblance. While previous studies have examined the emergence and, separately, the evolution of pictographic systems through a computational lens, most employ non-naturalistic methodologies that make it difficult to draw clear analogies to human and animal cognition. We develop a multi-agent reinforcement learning testbed for emergent communication called a Signification Game, and formulate a model of inferential communication that enables agents to leverage visual theory of mind to communicate actions using pictographs. Our model, which is situated within a broader formalism for animal communication, sheds light on the cognitive and cultural processes underlying the emergence of proto-writing.
Knowledge Retention for Continual Model-Based Reinforcement Learning
ArXiv.org · 2025-03-06
preprintOpen accessSenior authorWe propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.
Accelerating Residual Reinforcement Learning With Uncertainty Estimation
IEEE Robotics and Automation Letters · 2025-11-24
articleResidual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modification to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other Residual RL methods. Our algorithm significantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned policies in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.
ArXiv.org · 2025-07-31
preprintOpen accessSenior authorMitigating partial observability is a necessary but challenging task for general reinforcement learning algorithms. To improve an algorithm's ability to mitigate partial observability, researchers need comprehensive benchmarks to gauge progress. Most algorithms tackling partial observability are only evaluated on benchmarks with simple forms of state aliasing, such as feature masking and Gaussian noise. Such benchmarks do not represent the many forms of partial observability seen in real domains, like visual occlusion or unknown opponent intent. We argue that a partially observable benchmark should have two key properties. The first is coverage in its forms of partial observability, to ensure an algorithm's generalizability. The second is a large gap between the performance of a agents with more or less state information, all other factors roughly equal. This gap implies that an environment is memory improvable: where performance gains in a domain are from an algorithm's ability to cope with partial observability as opposed to other factors. We introduce best-practice guidelines for empirically benchmarking reinforcement learning under partial observability, as well as the open-source library POBAX: Partially Observable Benchmarks in JAX. We characterize the types of partial observability present in various environments and select representative environments for our benchmark. These environments include localization and mapping, visual control, games, and more. Additionally, we show that these tasks are all memory improvable and require hard-to-learn memory functions, providing a concrete signal for partial observability research. This framework includes recommended hyperparameters as well as algorithm implementations for fast, out-of-the-box evaluation, as well as highly performant environments implemented in JAX for GPU-scalable experimentation.
ArXiv.org · 2025-10-02
preprintOpen accessSenior authorAlgorithms that exploit factored Markov decision processes are far more sample-efficient than factor-agnostic methods, yet they assume a factored representation is known a priori -- a requirement that breaks down when the agent sees only high-dimensional observations. Conversely, deep reinforcement learning handles such inputs but cannot benefit from factored structure. We address this representation problem with Action-Controllable Factorization (ACF), a contrastive learning approach that uncovers independently controllable latent variables -- state components each action can influence separately. ACF leverages sparsity: actions typically affect only a subset of variables, while the rest evolve under the environment's dynamics, yielding informative data for contrastive training. ACF recovers the ground truth controllable factors directly from pixel observations on three benchmarks with known factored structure -- Taxi, FourRooms, and MiniGrid-DoorKey -- consistently outperforming baseline disentanglement algorithms.
Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network
2025-05-28
articleA major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from an SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that SEBN-based curricula frequently outperform other baselines.
Optimal Interactive Learning on the Job via Facility Location Planning
2025-06-21 · 1 citations
articleOpen accessSenior authorCollaborative robots must continually adapt to novel tasks and user preferences without overburdening the user.While prior interactive robot learning methods aim to reduce human effort, they are typically limited to single-task scenarios and are not well-suited for sustained, multi-task collaboration.We propose COIL (Cost-Optimal Interactive Learning)-a multitask interaction planner that minimizes human effort across a sequence of tasks by strategically selecting among three query types (skill, preference, and help).When user preferences are known, we formulate COIL as an uncapacitated facility location (UFL) problem, which enables bounded-suboptimal planning in polynomial time using off-the-shelf approximation algorithms.We extend our formulation to handle uncertainty in user preferences by incorporating one-step belief space planning, which uses these approximation algorithms as subroutines to maintain polynomial-time performance.Simulated and physical experiments on manipulation tasks show that our framework significantly reduces the amount of work allocated to the human while maintaining successful task completion.
Least Commitment Planning for the Object Scouting Problem
2025-10-19
articleState uncertainty is a primary obstacle to effective long-horizon robot task planning. State uncertainty can be decomposed into spatial uncertainty—resolved using SLAM—and uncertainty about the objects in the environment, formalized as the object scouting problem and modeled using the Locally Observable Markov Decision Process (LOMDP). We introduce a new planning framework specifically designed for object scouting with LOMDPs called the Scouting Partial-Order Planner (SPOP), which exploits the characteristics of partial order and regression planning to plan around knowledge gaps the robot may have about the existence, location, and state of relevant objects in its environment. Our results highlight the benefits of partial-order planning, demonstrating its suitability for object scouting due to its ability to identify absent but task-relevant objects, and show that it outperforms comparable planners in plan length, computation time, and execution time.
Discovering Temporal Structure: An Overview of Hierarchical Reinforcement Learning
ArXiv.org · 2025-06-16
preprintOpen accessDeveloping agents capable of exploring, planning and learning in complex open-ended environments is a grand challenge in artificial intelligence (AI). Hierarchical reinforcement learning (HRL) offers a promising solution to this challenge by discovering and exploiting the temporal structure within a stream of experience. The strong appeal of the HRL framework has led to a rich and diverse body of literature attempting to discover a useful structure. However, it is still not clear how one might define what constitutes good structure in the first place, or the kind of problems in which identifying it may be helpful. This work aims to identify the benefits of HRL from the perspective of the fundamental challenges in decision-making, as well as highlight its impact on the performance trade-offs of AI agents. Through these benefits, we then cover the families of methods that discover temporal structure in HRL, ranging from learning directly from online experience to offline datasets, to leveraging large language models (LLMs). Finally, we highlight the challenges of temporal structure discovery and the domains that are particularly well-suited for such endeavours.
SkillWrapper: Generative Predicate Invention for Task-level Planning
ArXiv.org · 2025-11-22
preprintOpen accessGeneralizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.
Recent grants
RI: Medium: Learning Task-Specific Representations for Broadly Capable Reinforcement Learning Agents
NSF · $1.2M · 2020–2024
CAREER: Learning Symbolic Representations for Robot Manipulation
NSF · $566k · 2019–2024
NSF · $208k · 2017–2021
Frequent coauthors
- 70 shared
Stefanie Tellex
John Brown University
- 33 shared
Eric Rosen
John Brown University
- 29 shared
Michael L. Littman
- 24 shared
Andrew G. Barto
Amherst College
- 20 shared
Ben Abbatematteo
- 20 shared
Leslie Pack Kaelbling
- 19 shared
Benjamin Burchfiel
- 15 shared
Cameron Allen
Monash University
Education
- 2009
Ph.D., Computer Science
Brown University
- 2005
M.S., Computer Science
Brown University
- 2003
B.S., Computer Science
Brown University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with George D Konidaris
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup