Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Satinder Singh Baveja

· Professor, EECS – Computer Science and Engineering

University of Michigan · Computer Science and Engineering

Active 1998–2025

h-index4
Citations77
Papers73 last 5y
Funding$275k
See your match with Satinder Singh Baveja — sign in to PhdFit.Sign in

About

Satinder Singh Baveja is a Professor in the Department of Electrical Engineering and Computer Science (EECS) at the University of Michigan. His research interests include Reinforcement Learning, Machine Learning, Computational Game Theory, and Adaptive Human-Computer Interaction. He is a key faculty member within the Michigan AI Lab, contributing to advancements in artificial intelligence through his focus on developing intelligent systems that learn and adapt in complex environments. His work emphasizes the integration of machine learning techniques with human-centered applications, aiming to create more responsive and responsible AI systems.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Machine Learning
  • Engineering
  • Human–computer interaction
  • Computer Security
  • Natural Language Processing
  • Psychology
  • Social psychology
  • Programming language
  • World Wide Web
  • Cognitive science

Selected publications

  • SIMA 2: A Generalist Embodied Agent for Virtual Worlds

    ArXiv.org · 2025-12-04

    preprintOpen access

    We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.

  • Vision-Language Models as a Source of Rewards

    arXiv (Cornell University) · 2023 · 3 citations

    • Computer Science
    • Artificial Intelligence
    • Computer Science

    Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

  • Human-Timescale Adaptation in an Open-Ended Task Space

    arXiv (Cornell University) · 2023 · 22 citations

    • Computer Science
    • Computer Science
    • Artificial Intelligence

    Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent's capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.

  • Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks

    2021 · 1 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Artificial Intelligence

    For each goal-oriented dialog task of interest, large amounts of data need to be collected for end-to-end learning of a neural dialog system. Collecting that data is a costly and time-consuming process. Instead, we show that we can use only a small amount of data, supplemented with data from a related dialog task. Naively learning from related data fails to improve performance as the related data can be inconsistent with the target task. We describe a meta-learning based method that selectively learns from the related dialog task data. Our approach leads to significant accuracy improvements in an example dialog task.

  • Multi-Stage Attack Graph Security Games

    2017-10-30 · 28 citations

    articleSenior author

    We study the problem of allocating limited security countermeasures to protect network data from cyber-attacks, for scenarios modeled by Bayesian attack graphs. We consider multi-stage interactions between a network administrator and cybercriminals, formulated as a security game. This formulation is capable of representing security environments with significant dynamics and uncertainty, and very large strategy spaces. For the game model, we propose parameterized heuristic strategies for both players. Our heuristics exploit the topological structure of the attack graphs and employ different sampling methodologies to overcome the computational complexity in determining players' actions. Given the complexity of the game, we employ a simulation-based methodology, and perform empirical game analysis over an enumerated set of these heuristic strategies. Finally, we conduct experiments based on a variety of game settings to demonstrate the advantages of our heuristics in obtaining effective defense strategies which are robust to the uncertainty of the security environment.

  • The optimal reward problem: designing effective reward for bounded agents

    2011-01-01 · 13 citations

    dissertation1st authorCorresponding

    In the field of reinforcement learning, agent designers build agents which seek to maximize reward. In standard practice, one reward function serves two purposes. It is used to evaluate the agent and is used to directly guide agent behavior in the agent's learning algorithm. This dissertation makes four main contributions to the theory and practice of reward function design. The first is a demonstration that if an agent is bounded—if it is limited in its ability to maximize expected reward—the designer may benefit by considering two reward functions. A designer reward function is used to evaluate the agent, while a separate agent reward function is used to guide agent behavior. The designer can then solve the Optimal Reward Problem (ORP): choose the agent reward function which leads to the greatest expected reward for the designer. The second contribution is the demonstration through examples that good reward functions are chosen by assessing an agent's limitations and how they interact with the environment. An agent which maintains knowledge of the environment in the form of a Bayesian posterior distribution, but lacks adequate planning resources, can be given a reward proportional to the variance of the posterior, resulting in provably efficient exploration. An agent with poor modeling assumptions can be punished for visiting the areas of the state space it has trouble modeling, resulting in better performance. The third contribution is the Policy Gradient for Reward Design (PGRD) algorithm, a convergent gradient ascent algorithm for learning good reward functions. Experiments in multiple environments demonstrate that using PGRD for reward optimization yields better agents than using the designer's reward directly as the agent's reward. It also outperforms the use of an evaluation function at the leaf-states of the planning tree. Finally, this dissertation shows that the ORP differs from the popular work on potential-based reward shaping. Shaping rewards are constrained by properties of the environment and the designer's reward function, but they generally are defined irrespective of properties of the agent. The best shaping reward functions are suboptimal for some agents and environments.

  • On predictive linear gaussian models

    2009-01-01 · 10 citations

    dissertation1st authorCorresponding

    Models are used by artificial agents to make predictions about the future; agents then use these predictions to modify their behavior. In many cases, these models are not known a priori and so the agent must learn a model through experience with a system. At the core of most models is the concept of state—an estimate of the current situation of the world from which a model's predictions are derived. A recent development in the study of models is the predictive state model. Predictive state models use predictions about potential future events as their state, as opposed to unobserved, unobservable variables, as in most traditional models. For example, a traditional model may represent a robot's location using latitude and longitude, which is unobservable without a GPS unit. A predictive state model of the same robot might represent its location with two events like “If I traveled forward 4 feet I would hit a wall” and “If I turned right and traveled forward 6 feet I would move into a hallway.” This dissertation presents two models that expand the limits of predictive state models, which had mostly modeled dynamical systems with discrete, scalar-valued observations, with linear predictions of future events. The first model, the e-test predictive state representation (EPSR), is the first nonlinear predictive state model that can be used to model a large class of dynamical systems. The EPSR models deterministic systems with discrete actions and observations, and is sometimes exponentially smaller than the equivalent model with linear predictions. The second model is the predictive linear Gaussian model (PLG), which models dynamical systems with continuous vector-valued actions and observations. I present theoretical results that show the PLG is representationally equivalent to the linear dynamical system (LDS), a popular traditional model, and that the parameter estimation algorithm I present is consistent—that is, in the limit of infinite data, it produces a correct model. I also apply this algorithm to (a) a number of artificial, randomly generated systems and (b) a real-world traffic prediction problem; and show that it performs well compared to expectation maximization, a parameter estimation algorithm for the LDS.

  • Exponential Family Predictive Representations of State

    Deep Blue (University of Michigan) · 2007-12-03 · 18 citations

    articleOpen accessSenior author

    2008 To my wife, Martha. ii Acknowledgments This work would not have been possible without generous help, both intellectually and financially. I am grateful to my advisor, Satinder Singh, for the long discussions we have had as he has patiently taught me to think clearly through my own ideas, sharpen my writing, and to raise my sights. A special thanks also to my lab mates, Matt Rudary, Britton Wolfe, Vishal Soni, Erik Talviti, Jonathan Sorg and Ishan Chaudhuri for always letting me bounce ideas around, for listening, and for patient tutoring. Thanks to Andrew Nuxoll for being a kindred spirit, to Nick Gorski for the occasional foosball game and to my collaborators at the University of Alberta. Finally, I would like to gratefully acknowledge the National Science Foundation for financially supporting me through most of my studies with a Graduate Research Fellowship. Finally, a special thank you to my wife Martha for her love, her constancy, her feistiness and for always keeping me on the straight and narrow. Thank you, Grace, Peterson and Andrew for reminding

  • Using predictions for planning and modeling in *stochastic environments.

    Deep Blue (University of Michigan) · 2005-01-01 · 12 citations

    articleSenior author

    The problem of defining and working with models of systems that change with time is common to many disciplines. Within artificial intelligence, it is common to provide a computer-based agent with models---or the facility for building models---so that it can learn about, and make informed decisions about, the environment within which it exists. This is especially challenging when the environment exhibits both stochasticity and partial-observability. A commonality among many different types of models is that they are able to make predictions---probabilistic or otherwise---about future outcomes. These predictions play a central role in the agent's methods for decision-making (planning) and learning. This thesis develops a recently introduced approach to modeling, in which predictions serve as the model's representation of its current state. A general framework for building models, called the predictive state representation (PSR) is examined in depth, and theoretical results and algorithms are developed for PSRs, laying the foundation for building models using predictive representations of state. PSRs are examined in terms of their expressive power, by examining the class of environments that can be modeled using PSRs as compared to other common approaches for building models. It is shown that PSRs are at least as expressive as many other common approaches. Algorithms are developed that leverage the predictive representation of state in order to learn a PSR model based on the agent's experience with the environment. Furthermore, techniques are developed to allow an agent to make optimal decisions about its behavior, in the context of sequential decision problems---where any choice may have far-reaching consequences. In addition, an extension of PSRs is presented, which incorporates a memory of the past with predictions about the future. Learning and decision-making algorithms are also developed for these memory-PSRs. The work in this dissertation lays the groundwork for how predictive representations of state may be used for building models, by examining the expressive power of these models, and by developing algorithms and the necessary theoretical results for learning and planning.

  • Modeling temporal structure of time series with hidden markov experts

    1998-01-01 · 2 citations

    articleSenior author

    This thesis explored using hidden Markov models for modeling time series and applied the model to both point forecasting and density forecasting. Traditionally, hidden Markov models are used in speech recognition, where predictions are not concerned. In this thesis, we used the algorithm to do both point forecasting and density forecasting. This thesis contains both theoretical and empirical analysis of the methods. The studies lead to the understanding of the method's behavior and possible applications to financial time series. Hidden Markov Experts extend directly from hidden Markov models. In the proposed model, each expert can be linear or nonlinear. Based on the likelihood function, we discussed the EM algorithm for Hidden Markov Experts. We used the model in point forecasting and in regime recovering. For the computer simulated data, the new algorithm found the correct parameters and recovered the regimes that generating the data. Compared to the gated experts, the new approach is more powerful in modeling regime switching time series. The new algorithm is also applied to real world financial data: modeling both high frequency foreign exchange data and daily S&P500 data. The regimes retrieved by Hidden Markov Experts were found to be corresponding to the volatility clustering. This makes further applications of Hidden Markov Experts in option pricing a very interesting topic for future study. Hidden Markov Experts are also used to predict the conditional density of time series. Switching models were mainly used in economic field to predict only the conditional mean of time series. In this thesis, we applied Hidden Markov Experts to construct the density forecasts by assuming the density is mixture of Gaussians. From the simulated experiment, we can see that hidden Markov experts can predict the density correctly under the criteria of probability integral transform method. We also applied this approach to the S&P 500 data. It is important to see that even it is hard to predict to conditional mean, the algorithm still significantly improves the forecasts of density.

Recent grants

Frequent coauthors

  • Tim Rocktäschel

    2 shared
  • Kate Baumli

    2 shared
  • Jack Parker-Holder

    2 shared
  • Feryal Behbahani

    2 shared
  • Yannick Schroecker

    2 shared
  • Richie Steigerwald

    1 shared
  • Kristian Holsheimer

    1 shared
  • Gheorghe Comanici

    Google (United States)

    1 shared

Labs

  • Michigan AI LabPI

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Satinder Singh Baveja

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup