Sven Koenig
· Chancellor’s Professor and Bren ChairVerifiedUniversity of California, Irvine · Computer Science
Active 1956–2026
About
Sven Koenig is a Chancellor’s Professor and Bren Chair in the Department of Computer Science at UC Irvine. His research focuses on intelligent systems that operate in large, nondeterministic, nonstationary, or only partially known domains. Most of his work centers around techniques for decision making, including planning and learning, that enable single situated agents such as robots or decision-support systems, as well as teams of agents, to act intelligently in their environments and exhibit goal-directed behavior in real time. These systems are designed to function effectively even with incomplete knowledge of their environments, imperfect manipulation abilities, limited or noisy perception, or insufficient reasoning speed. His research draws on multiple fields including artificial intelligence, decision theory, and operations research, reflecting the interdisciplinary nature of his work. Applications of his research include robotics, logistics, and video games. Sven Koenig is a fellow of several professional organizations, including the Association for the Advancement of Artificial Intelligence (AAAI), the Association for Computing Machinery (ACM), the Institute of Electrical and Electronics Engineers (IEEE), and the American Association for the Advancement of Science (AAAS). His educational background includes a Ph.D. and M.S. in Computer Science from Carnegie Mellon University, an M.S. in Computer Science from the University of California at Berkeley, and a Diplom in Computer Science and Business Administration from the University of Hamburg.
Research topics
- Computer Science
- Computer Security
- Artificial Intelligence
- Mathematics
- Mathematical optimization
- Distributed computing
- Physics
Selected publications
Open MIND · 2026-01-27
preprintMulti-Agent Path Finding (MAPF) is an NP-hard problem with applications in warehouse automation and multi-robot coordination. Learning-based MAPF solvers offer fast and scalable planning but often produce feasible trajectories that contain unnecessary or oscillatory movements. We propose Judgelight, a post-optimization layer that improves trajectory quality after a MAPF solver generates a feasible schedule. Judgelight collapses closed subwalks in agents' trajectories to remove redundant movements while preserving all feasibility constraints. We formalize this process as MAPF-Collapse, prove that it is NP-hard, and present an exact optimization approach by formulating it as integer linear programming (ILP) problem. Experimental results show Judgelight consistently reduces solution cost by around 20%, particularly for learning-based solvers, producing trajectories that are better suited for real-world deployment.
Dynamic Incentivized Cooperation under Changing Rewards
ArXiv.org · 2026-01-10
articleOpen accessSenior authorPeer incentivization (PI) is a popular multi-agent reinforcement learning approach where all agents can reward or penalize each other to achieve cooperation in social dilemmas. Despite their potential for scalable cooperation, current PI methods heavily depend on fixed incentive values that need to be appropriately chosen with respect to the environmental rewards and thus are highly sensitive to their changes. Therefore, they fail to maintain cooperation under changing rewards in the environment, e.g., caused by modified specifications, varying supply and demand, or sensory flaws - even when the conditions for mutual cooperation remain the same. In this paper, we propose Dynamic Reward Incentives for Variable Exchange (DRIVE), an adaptive PI approach to cooperation in social dilemmas with changing rewards. DRIVE agents reciprocally exchange reward differences to incentivize mutual cooperation in a completely decentralized way. We show how DRIVE achieves mutual cooperation in the general Prisoner's Dilemma and empirically evaluate DRIVE in more complex sequential social dilemmas with changing rewards, demonstrating its ability to achieve and maintain cooperation, in contrast to current state-of-the-art PI methods.
Dynamic Incentivized Cooperation under Changing Rewards
arXiv (Cornell University) · 2026-01-10
preprintOpen accessSenior authorPeer incentivization (PI) is a popular multi-agent reinforcement learning approach where all agents can reward or penalize each other to achieve cooperation in social dilemmas. Despite their potential for scalable cooperation, current PI methods heavily depend on fixed incentive values that need to be appropriately chosen with respect to the environmental rewards and thus are highly sensitive to their changes. Therefore, they fail to maintain cooperation under changing rewards in the environment, e.g., caused by modified specifications, varying supply and demand, or sensory flaws - even when the conditions for mutual cooperation remain the same. In this paper, we propose Dynamic Reward Incentives for Variable Exchange (DRIVE), an adaptive PI approach to cooperation in social dilemmas with changing rewards. DRIVE agents reciprocally exchange reward differences to incentivize mutual cooperation in a completely decentralized way. We show how DRIVE achieves mutual cooperation in the general Prisoner's Dilemma and empirically evaluate DRIVE in more complex sequential social dilemmas with changing rewards, demonstrating its ability to achieve and maintain cooperation, in contrast to current state-of-the-art PI methods.
ArXiv.org · 2026-01-27
articleOpen accessMulti-Agent Path Finding (MAPF) is an NP-hard problem with applications in warehouse automation and multi-robot coordination. Learning-based MAPF solvers offer fast and scalable planning but often produce feasible trajectories that contain unnecessary or oscillatory movements. We propose Judgelight, a post-optimization layer that improves trajectory quality after a MAPF solver generates a feasible schedule. Judgelight collapses closed subwalks in agents' trajectories to remove redundant movements while preserving all feasibility constraints. We formalize this process as MAPF-Collapse, prove that it is NP-hard, and present an exact optimization approach by formulating it as integer linear programming (ILP) problem. Experimental results show Judgelight consistently reduces solution cost by around 20%, particularly for learning-based solvers, producing trajectories that are better suited for real-world deployment.
2025-05-19 · 2 citations
articleSenior authorA modern smart factory runs a manufacturing procedure using a collection of programmable machines. Typically, materials are ferried between these machines using a team of mobile robots. To embed a manufacturing procedure in a smart factory, a factory operator must a) assign its processes to the smart factory's machines and b) determine how agents should carry materials between machines. A good embedding maximizes the smart factory's throughput; the rate at which it outputs products. Existing smart factory management systems solve the aforementioned problems sequentially, limiting the throughput that they can achieve. In this paper we introduce ACES, the Anytime Cyclic Embedding Solver, the first solver which jointly optimizes the assignment of processes to machines and the assignment of paths to agents. We evaluate ACES and show that it can scale to real industrial scenarios.
LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning
ArXiv.org · 2025-10-01
preprintOpen accessSince the release of Deepseek-R1, reinforcement learning with verifiable rewards (RLVR) has become a central approach for training large language models (LLMs) on reasoning tasks. Recent work has largely focused on modifying loss functions to make RLVR more efficient and effective. In this paper, motivated by studies of overthinking in LLMs, we propose Length-aware Sampling for Policy Optimization (LSPO), a novel meta-RLVR algorithm that dynamically selects training data at each step based on the average response length. We evaluate LSPO across multiple base models and datasets, demonstrating that it consistently improves learning effectiveness. In addition, we conduct a detailed ablation study to examine alternative ways of incorporating length signals into dynamic sampling, offering further insights and highlighting promising directions for future research.
Proceedings of the International Symposium on Combinatorial Search · 2025-07-19
articleOpen accessThe bi-objective shortest-path (BOSP) problem seeks to find paths between start and target vertices of a graph while optimizing two conflicting objective functions. We consider the BOSP problem in the presence of correlated objectives. Such correlations often occur in real-world settings such as road networks, where optimizing two positively correlated objectives, such as travel time and fuel consumption, is common. BOSP is generally computationally challenging as the size of the search space is exponential in the number of objective functions and the graph size. Bounded sub-optimal BOSP solvers such as A*pex alleviate this complexity by approximating the Pareto-optimal solution set rather than computing it exactly (given a user-provided approximation factor). As the correlation between objective functions increases, smaller approximation factors are sufficient for collapsing the entire Pareto-optimal set into a single solution. We leverage this insight to propose an efficient algorithm that reduces the search effort in the presence of correlated objectives. Our approach for computing approximations of the entire Pareto-optimal set is inspired by graph-clustering algorithms. It uses a preprocessing phase to identify correlated clusters within a graph and to generate a new graph representation. This allows a natural generalization of A*pex to run up to five times faster on DIMACS dataset instances, a standard benchmark in the field. To the best of our knowledge, this is the first algorithm proposed that efficiently and effectively exploits correlations in the context of bi-objective search while providing theoretical guarantees on solution quality.
Accelerating Focal Search in Multi-Agent Path Finding with Tighter Lower Bounds
ArXiv.org · 2025-03-04
preprintOpen accessSenior authorMulti-Agent Path Finding (MAPF) involves finding collision-free paths for multiple agents while minimizing a cost function--an NP-hard problem. Bounded suboptimal methods like Enhanced Conflict-Based Search (ECBS) and Explicit Estimation CBS (EECBS) balance solution quality with computational efficiency using focal search mechanisms. While effective, traditional focal search faces a limitation: the lower bound (LB) value determining which nodes enter the FOCAL list often increases slowly in early search stages, resulting in a constrained search space that delays finding valid solutions. In this paper, we propose a novel bounded suboptimal algorithm, double-ECBS (DECBS), to address this issue by first determining the maximum LB value and then employing a best-first search guided by this LB to find a collision-free path. Experimental results demonstrate that DECBS outperforms ECBS in most test cases and is compatible with existing optimization techniques. DECBS can reduce nearly 30% high-level CT nodes and 50% low-level focal search nodes. When agent density is moderate to high, DECBS achieves a 23.5% average runtime improvement over ECBS with identical suboptimality bounds and optimizations.
2025-10-19
articleSenior authorModern automated factories increasingly run manufacturing procedures using a matrix of programmable machines, such as 3D printers, interconnected by a programmable transport system, such as a fleet of tabletop robots. To embed a manufacturing procedure into a smart factory, an operator must: (a) assign each of its processes to a machine and (b) specify how agents should transport parts between machines. The problem of embedding a manufacturing process into a smart factory is termed the Smart Factory Embedding (SFE) problem. State-of-the-art SFE solvers can only scale to factories containing a couple dozen machines. Modern smart factories, however, may contain hundreds of machines. We fill this hole by introducing the first highly scalable solution to the SFE, TSACES, the Traffic System based Anytime Cyclic Embedding Solver. We show that TS-ACES is complete and can scale to SFE instances based on real industrial scenarios with more than a hundred machines.
New Mechanisms in Flex Distribution for Bounded Suboptimal Multi-Agent Path Finding
Proceedings of the International Symposium on Combinatorial Search · 2025-07-19
articleOpen accessSenior authorMulti-Agent Path Finding (MAPF) is the problem of finding a set of collision-free paths, one for each agent in a shared environment. Its objective is to minimize the sum of path costs (SOC), where the path cost of each agent is defined as the travel time from its start location to its target location. Explicit Estimation Conflict-Based Search (EECBS) is the leading algorithm for bounded-suboptimal MAPF, with the SOC of the solution being at most a user-specified factor w away from optimal. EECBS maintains sets of paths and a lower bound LB on the optimal SOC. Then, it iteratively selects a set of paths whose SOC is at most w times LB and introduces constraints to resolve collisions. For each path in a set, EECBS maintains a lower bound on its optimal path that satisfies constraints. By finding a path with cost at most its threshold, defined as w times its lower bound, EECBS guarantees to find a bounded-suboptimal solution. To speed up EECBS, previous work uses flex distribution to relax the requirement that each path needs to be at most its threshold. Though EECBS with flex distribution guarantees to find a bounded-suboptimal solution, increasing the thresholds may increase the SOC beyond w times LB, forcing EECBS to switch among different sets of paths (whose SOC are still at most w times LB), and thus reducing efficiency. To address this issue, we propose Conflict-Based Flex Distribution that distributes flex in proportion to the number of collisions. We also estimate the extra travel time (i.e., delays) needed to satisfy constraints and propose Delay-Based Flex Distribution. On top of that, we propose Mixed-Strategy Flex Distribution, combining both in a hierarchical framework. We prove that EECBS with our new flex distribution mechanisms is complete and bounded-suboptimal. The experiments show that our approaches outperform the original (greedy) flex distribution. Also, we redesign Focal-A* search from the previous work to improve LB for a congested environment.
Recent grants
CPS: Small: Novel Algorithmic Techniques for Drone Flight Planning on a Large Scale
NSF · $500k · 2018–2024
NSF-BSF:RI:Small:Collaborative Research:Next-Generation Multi-Agent Path Finding Algorithms
NSF · $313k · 2018–2024
RI: Medium: Collaborative Research: Experience-Based Planning: A Framework for Lifelong Planning
NSF · $348k · 2014–2021
S&AS: FND: Long-Term Planning and Robust Plan Execution for Multi-Robot Systems
NSF · $600k · 2017–2023
NSF · $453k · 2013–2019
Frequent coauthors
- 252 shared
T. K. Satish Kumar
- 215 shared
Jiaoyang Li
- 159 shared
Hang Ma
Simon Fraser University
- 104 shared
Liron Cohen
Ben-Gurion University of the Negev
- 102 shared
Tansel Uras
University of Southern California
- 89 shared
Daniel Harabor
Australian Regenerative Medicine Institute
- 83 shared
Ariel Felner
Ben-Gurion University of the Negev
- 78 shared
Peter J. Stuckey
Awards & honors
- Fellow of the Association for the Advancement of Artificial…
- Fellow of the Association for Computing Machinery (ACM)
- Fellow of the Institute of Electrical and Electronics Engine…
- Fellow of the American Association for the Advancement of Sc…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Sven Koenig
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup