George D Konidaris

· Associate Professor of Computer ScienceVerified

Brown University · Computer Science

Active 2002–2025

h-index47

Citations7.1k

Papers259114 last 5y

Funding$2.0M

Faculty page Lab page

See your match with George D Konidaris — sign in to PhdFit.Sign in

About

George D. Konidaris is an Associate Professor of Computer Science at Brown University and the director of the Intelligent Robot Lab, which is part of Brown's Integrative, General AI initiative (bigAI). His research is driven by the overarching scientific goal of understanding the fundamental computational processes that generate intelligence and using this understanding to design generally intelligent robots. Konidaris focuses on building intelligent, autonomous, general-purpose robots capable of performing a wide variety of tasks and operating in diverse environments. His work centers on designing agents that learn abstraction hierarchies to enable fast, goal-oriented planning. He develops and applies techniques from machine learning, reinforcement learning, optimal control, and planning to construct well-grounded hierarchies that facilitate rapid planning in common cases while maintaining robustness to uncertainty at every level of control. He emphasizes that solving the AI problem requires advances in all these areas as well as in their integration. In addition to his academic role, Konidaris is a co-founder of two technology startups. Realtime Robotics commercializes research on robot motion planning to simplify and improve robotic automation, while Lelapa AI is a commercial AI research lab based in Johannesburg, South Africa, focused on technology developed by and for Africans. Konidaris lives in Providence, Rhode Island, where he continues to lead research efforts aimed at advancing the field of intelligent robotics through both theoretical and applied contributions.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Computer Security
Computer vision
Engineering
Human–computer interaction
Mathematics
Medicine
Mathematical analysis
Algorithm
Biomedical engineering
Simulation
Theoretical computer science
Mathematical optimization
Ophthalmology
Surgery

Selected publications

Visual Theory of Mind Enables the Invention of Proto-Writing
ArXiv.org · 2025-02-03
preprintOpen accessSenior author
Symbolic writing systems are graphical semiotic codes that are ubiquitous in modern society but are otherwise absent in the animal kingdom. Anthropological evidence suggests that the earliest forms of some writing systems originally consisted of iconic pictographs, which signify their referent via visual resemblance. While previous studies have examined the emergence and, separately, the evolution of pictographic systems through a computational lens, most employ non-naturalistic methodologies that make it difficult to draw clear analogies to human and animal cognition. We develop a multi-agent reinforcement learning testbed for emergent communication called a Signification Game, and formulate a model of inferential communication that enables agents to leverage visual theory of mind to communicate actions using pictographs. Our model, which is situated within a broader formalism for animal communication, sheds light on the cognitive and cultural processes underlying the emergence of proto-writing.
Publisher OA PDF DOI
Knowledge Retention for Continual Model-Based Reinforcement Learning
ArXiv.org · 2025-03-06
preprintOpen accessSenior author
We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.
Publisher OA PDF DOI
Accelerating Residual Reinforcement Learning With Uncertainty Estimation
IEEE Robotics and Automation Letters · 2025-11-24
article
Residual Reinforcement Learning (RL) is a popular approach for adapting pretrained policies by learning a lightweight residual policy that provides corrective actions. While Residual RL is more sample-efficient than finetuning the entire base policy, existing methods struggle with sparse rewards and are designed for deterministic base policies. We propose two improvements to Residual RL that further enhance its sample efficiency and make it suitable for stochastic base policies. First, we leverage uncertainty estimates of the base policy to focus exploration on regions in which the base policy is not confident. Second, we propose a simple modification to off-policy residual learning that allows it to observe base actions and better handle stochastic base policies. We evaluate our method with both Gaussian-based and Diffusion-based stochastic base policies on tasks from Robosuite and D4RL, and compare against state-of-the-art finetuning methods, demo-augmented RL methods, and other Residual RL methods. Our algorithm significantly outperforms existing baselines in a variety of simulation benchmark environments. We also deploy our learned policies in the real world to demonstrate their robustness with zero-shot sim-to-real transfer.
Publisher DOI
Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains
ArXiv.org · 2025-07-31
preprintOpen accessSenior author
Mitigating partial observability is a necessary but challenging task for general reinforcement learning algorithms. To improve an algorithm's ability to mitigate partial observability, researchers need comprehensive benchmarks to gauge progress. Most algorithms tackling partial observability are only evaluated on benchmarks with simple forms of state aliasing, such as feature masking and Gaussian noise. Such benchmarks do not represent the many forms of partial observability seen in real domains, like visual occlusion or unknown opponent intent. We argue that a partially observable benchmark should have two key properties. The first is coverage in its forms of partial observability, to ensure an algorithm's generalizability. The second is a large gap between the performance of a agents with more or less state information, all other factors roughly equal. This gap implies that an environment is memory improvable: where performance gains in a domain are from an algorithm's ability to cope with partial observability as opposed to other factors. We introduce best-practice guidelines for empirically benchmarking reinforcement learning under partial observability, as well as the open-source library POBAX: Partially Observable Benchmarks in JAX. We characterize the types of partial observability present in various environments and select representative environments for our benchmark. These environments include localization and mapping, visual control, games, and more. Additionally, we show that these tasks are all memory improvable and require hard-to-learn memory functions, providing a concrete signal for partial observability research. This framework includes recommended hyperparameters as well as algorithm implementations for fast, out-of-the-box evaluation, as well as highly performant environments implemented in JAX for GPU-scalable experimentation.
Publisher OA PDF DOI
From Pixels to Factors: Learning Independently Controllable State Variables for Reinforcement Learning
ArXiv.org · 2025-10-02
preprintOpen accessSenior author
Algorithms that exploit factored Markov decision processes are far more sample-efficient than factor-agnostic methods, yet they assume a factored representation is known a priori -- a requirement that breaks down when the agent sees only high-dimensional observations. Conversely, deep reinforcement learning handles such inputs but cannot benefit from factored structure. We address this representation problem with Action-Controllable Factorization (ACF), a contrastive learning approach that uncovers independently controllable latent variables -- state components each action can influence separately. ACF leverages sparsity: actions typically affect only a subset of variables, while the rest evolve under the environment's dynamics, yielding informative data for contrastive training. ACF recovers the ground truth controllable factors directly from pixel observations on three benchmarks with known factored structure -- Taxi, FourRooms, and MiniGrid-DoorKey -- consistently outperforming baseline disentanglement algorithms.
Publisher OA PDF DOI
Automating Curriculum Learning for Reinforcement Learning using a Skill-Based Bayesian Network
2025-05-28
article
A major challenge for reinforcement learning is automatically generating curricula to reduce training time or improve performance in some target task. We introduce SEBNs (Skill-Environment Bayesian Networks) which model a probabilistic relationship between a set of skills, a set of goals that relate to the reward structure, and a set of environment features to predict policy performance on (possibly unseen) tasks. We develop an algorithm that uses the inferred estimates of agent success from an SEBN to weigh the possible next tasks by expected improvement. We evaluate the benefit of the resulting curriculum on three environments: a discrete gridworld, continuous control, and simulated robotics. The results show that SEBN-based curricula frequently outperform other baselines.
Publisher DOI
Optimal Interactive Learning on the Job via Facility Location Planning
2025-06-21 · 1 citations
articleOpen accessSenior author
Collaborative robots must continually adapt to novel tasks and user preferences without overburdening the user.While prior interactive robot learning methods aim to reduce human effort, they are typically limited to single-task scenarios and are not well-suited for sustained, multi-task collaboration.We propose COIL (Cost-Optimal Interactive Learning)-a multitask interaction planner that minimizes human effort across a sequence of tasks by strategically selecting among three query types (skill, preference, and help).When user preferences are known, we formulate COIL as an uncapacitated facility location (UFL) problem, which enables bounded-suboptimal planning in polynomial time using off-the-shelf approximation algorithms.We extend our formulation to handle uncertainty in user preferences by incorporating one-step belief space planning, which uses these approximation algorithms as subroutines to maintain polynomial-time performance.Simulated and physical experiments on manipulation tasks show that our framework significantly reduces the amount of work allocated to the human while maintaining successful task completion.
Publisher DOI
Least Commitment Planning for the Object Scouting Problem
2025-10-19
article
State uncertainty is a primary obstacle to effective long-horizon robot task planning. State uncertainty can be decomposed into spatial uncertainty—resolved using SLAM—and uncertainty about the objects in the environment, formalized as the object scouting problem and modeled using the Locally Observable Markov Decision Process (LOMDP). We introduce a new planning framework specifically designed for object scouting with LOMDPs called the Scouting Partial-Order Planner (SPOP), which exploits the characteristics of partial order and regression planning to plan around knowledge gaps the robot may have about the existence, location, and state of relevant objects in its environment. Our results highlight the benefits of partial-order planning, demonstrating its suitability for object scouting due to its ability to identify absent but task-relevant objects, and show that it outperforms comparable planners in plan length, computation time, and execution time.
Publisher DOI
Discovering Temporal Structure: An Overview of Hierarchical Reinforcement Learning
ArXiv.org · 2025-06-16
preprintOpen access
Developing agents capable of exploring, planning and learning in complex open-ended environments is a grand challenge in artificial intelligence (AI). Hierarchical reinforcement learning (HRL) offers a promising solution to this challenge by discovering and exploiting the temporal structure within a stream of experience. The strong appeal of the HRL framework has led to a rich and diverse body of literature attempting to discover a useful structure. However, it is still not clear how one might define what constitutes good structure in the first place, or the kind of problems in which identifying it may be helpful. This work aims to identify the benefits of HRL from the perspective of the fundamental challenges in decision-making, as well as highlight its impact on the performance trade-offs of AI agents. Through these benefits, we then cover the families of methods that discover temporal structure in HRL, ranging from learning directly from online experience to offline datasets, to leveraging large language models (LLMs). Finally, we highlight the challenges of temporal structure discovery and the domains that are particularly well-suited for such endeavours.
Publisher OA PDF DOI
SkillWrapper: Generative Predicate Invention for Task-level Planning
ArXiv.org · 2025-11-22
preprintOpen access
Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.
Publisher OA PDF DOI

Recent grants

RI: Medium: Learning Task-Specific Representations for Broadly Capable Reinforcement Learning Agents
NSF · $1.2M · 2020–2024
CAREER: Learning Symbolic Representations for Robot Manipulation
NSF · $566k · 2019–2024
RI: Small: Collaborative Research: Hidden Parameter Markov Decision Processes: Exploiting Structure in Families of Tasks
NSF · $208k · 2017–2021

Frequent coauthors

Stefanie Tellex
John Brown University
70 shared
Eric Rosen
John Brown University
33 shared
Michael L. Littman
29 shared
Andrew G. Barto
Amherst College
24 shared
Ben Abbatematteo
20 shared
Leslie Pack Kaelbling
20 shared
Benjamin Burchfiel
19 shared
Cameron Allen
Monash University
15 shared

Education

Ph.D., Computer Science
Brown University
2009
M.S., Computer Science
Brown University
2005
B.S., Computer Science
Brown University
2003

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with George D Konidaris

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you