Geoffrey J Gordon

· Professor, Associate Department Head

Carnegie Mellon University · Machine Learning Department

Active 1899–2025

h-index52

Citations13.0k

Papers22712 last 5y

Funding—

Faculty page Lab page

OpenAlex

See your match with Geoffrey J Gordon — sign in to PhdFit.Sign in

About

Geoffrey J. Gordon is a professor in the Machine Learning Department at Carnegie Mellon University, with an affiliation to the Robotics Institute. His research interests encompass multi-agent planning, reinforcement learning, decision-theoretic planning, statistical models of complex data such as maps, video, and text, computational learning theory, and game theory. He leads the SELECT lab, which focuses on sensing, learning, and acting. Gordon has a background that includes a visiting professorship at Stanford Robotics Lab during the academic year 2003-2004, and prior industry experience at Burning Glass Technologies, where he worked on intelligent search and matching software for resumes and job postings. His academic journey includes a postdoctoral position at the AUTON lab in the Robotics Institute and a PhD in Computer Science under advisor Tom Mitchell.

Research topics

Artificial Intelligence
Machine Learning
Computer Science
Data Mining
Computer Security
Mathematics
Algorithm
Theoretical computer science
Statistics

Selected publications

A Vision for Computational Decarbonization of Societal Infrastructure
IEEE Internet Computing · 2025-03-01 · 4 citations
article
Modern society is at a critical inflection point with rapidly accelerating demand for energy due to growth in domestic manufacturing, datacenters, artificial intelligence (AI), electric vehicles, and electric heat pumps. Sustaining this growth while also reducing society’s carbon emissions will necessitate a shift beyond our long-standing focus on improving energy-efficiency to optimizing carbon-efficiency. This paper lays out a vision for a new field of Computational Decarbonization (CoDec), which focuses on optimizing and reducing the lifecycle carbon emissions of complex computing and societal infrastructure systems. We identify an important class of decarbonization problems that arise from interdependencies across multiple infrastructure domains, including computing, transportation, the built environment, and the electric power grid. As we discuss, solving these problems will require developing novel computational techniques, algorithms, systems, and AI methods that sense, optimize, and reduce the operational, embodied, and lifecycle greenhouse gas emissions of societal infrastructure over long temporal and spatial scales.
Publisher DOI
CurvGAD: Leveraging Curvature for Enhanced Graph Anomaly Detection
ArXiv.org · 2025-02-12
preprintOpen access
Does the intrinsic curvature of complex networks hold the key to unveiling graph anomalies that conventional approaches overlook? Reconstruction-based graph anomaly detection (GAD) methods overlook such geometric outliers, focusing only on structural and attribute-level anomalies. To this end, we propose CurvGAD - a mixed-curvature graph autoencoder that introduces the notion of curvature-based geometric anomalies. CurvGAD introduces two parallel pipelines for enhanced anomaly interpretability: (1) Curvature-equivariant geometry reconstruction, which focuses exclusively on reconstructing the edge curvatures using a mixed-curvature, Riemannian encoder and Gaussian kernel-based decoder; and (2) Curvature-invariant structure and attribute reconstruction, which decouples structural and attribute anomalies from geometric irregularities by regularizing graph curvature under discrete Ollivier-Ricci flow, thereby isolating the non-geometric anomalies. By leveraging curvature, CurvGAD refines the existing anomaly classifications and identifies new curvature-driven anomalies. Extensive experimentation over 10 real-world datasets (both homophilic and heterophilic) demonstrates an improvement of up to 6.5% over state-of-the-art GAD methods. The code is available at: https://github.com/karish-grover/curvgad.
Publisher OA PDF DOI
LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
arXiv (Cornell University) · 2024-07-22
preprintOpen access
Recent advances in reinforcement learning (RL) have predominantly leveraged neural network policies for decision-making, yet these models often lack interpretability, posing challenges for stakeholder comprehension and trust. Concept bottleneck models offer an interpretable alternative by integrating human-understandable concepts into policies. However, prior work assumes that concept annotations are readily available during training. For RL, this requirement poses a significant limitation: it necessitates continuous real-time concept annotation, which either places an impractical burden on human annotators or incurs substantial costs in API queries and inference time when employing automated labeling methods. To overcome this limitation, we introduce a novel training scheme that enables RL agents to efficiently learn a concept-based policy by only querying annotators to label a small set of data. Our algorithm, LICORICE, involves three main contributions: interleaving concept learning and RL training, using an ensemble to actively select informative data points for labeling, and decorrelating the concept data. We show how LICORICE reduces human labeling efforts to 500 or fewer concept labels in three environments, and 5000 or fewer in two more complex environments, all at no cost to performance. We also explore the use of VLMs as automated concept annotators, finding them effective in some cases but imperfect in others. Our work significantly reduces the annotation burden for interpretable RL, making it more practical for real-world applications that necessitate transparency.
Publisher OA PDF DOI
Meta-Analysis with Untrusted Data
arXiv (Cornell University) · 2024-07-12
preprintOpen accessSenior author
[See paper for full abstract] Meta-analysis is a crucial tool for answering scientific questions. It is usually conducted on a relatively small amount of ``trusted'' data -- ideally from randomized, controlled trials -- which allow causal effects to be reliably estimated with minimal assumptions. We show how to answer causal questions much more precisely by making two changes. First, we incorporate untrusted data drawn from large observational databases, related scientific literature and practical experience -- without sacrificing rigor or introducing strong assumptions. Second, we train richer models capable of handling heterogeneous trials, addressing a long-standing challenge in meta-analysis. Our approach is based on conformal prediction, which fundamentally produces rigorous prediction intervals, but doesn't handle indirect observations: in meta-analysis, we observe only noisy effects due to the limited number of participants in each trial. To handle noise, we develop a simple, efficient version of fully-conformal kernel ridge regression, based on a novel condition called idiocentricity. We introduce noise-correcting terms in the residuals and analyze their interaction with a ``variance shaving'' technique. In multiple experiments on healthcare datasets, our algorithms deliver tighter, sounder intervals than traditional ones. This paper charts a new course for meta-analysis and evidence-based medicine, where heterogeneity and untrusted data are embraced for more nuanced and precise predictions.
Publisher OA PDF DOI
Understanding and Mitigating Accuracy Disparity in Regression
arXiv (Cornell University) · 2021-02-24 · 5 citations
preprintOpen access
With the widespread deployment of large-scale prediction systems in high-stakes domains, e.g., face recognition, criminal justice, etc., disparity in prediction accuracy between different demographic subgroups has called for fundamental understanding on the source of such disparity and algorithmic intervention to mitigate it. In this paper, we study the accuracy disparity problem in regression. To begin with, we first propose an error decomposition theorem, which decomposes the accuracy disparity into the distance between marginal label distributions and the distance between conditional representations, to help explain why such accuracy disparity appears in practice. Motivated by this error decomposition and the general idea of distribution alignment with statistical distances, we then propose an algorithm to reduce this disparity, and analyze its game-theoretic optima of the proposed objective functions. To corroborate our theoretical findings, we also conduct experiments on five benchmark datasets. The experimental results suggest that our proposed algorithms can effectively mitigate accuracy disparity while maintaining the predictive power of the regression models.
Publisher OA PDF DOI
Successor Feature Sets: Generalizing Successor Representations Across Policies
arXiv (Cornell University) · 2021-03-03
articleOpen accessSenior author
Successor-style representations have many advantages for reinforcement learning: for example, they can help an agent generalize from past experience to new goals, and they have been proposed as explanations of behavioral and neural data from human and animal learners. They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future experiences, and like the latter they allow efficient prediction of total discounted rewards. However, successor-style representations are not optimized to generalize across policies: typically, we maintain a limited-length list of policies, and share information among them by representation learning or GPI. Successor-style representations also typically make no provision for gathering information or reasoning about latent variables. To address these limitations, we bring together ideas from predictive state representations, belief space value iteration, successor features, and convex analysis: we develop a new, general successor-style representation, together with a Bellman equation that connects multiple sources of information within this representation, including different latent states, policies, and reward functions. The new representation is highly expressive: for example, it lets us efficiently read off an optimal policy for a new reward function, or a policy that imitates a new demonstration. For this paper, we focus on exact computation of the new representation in small, known environments, since even this restricted setting offers plenty of interesting questions. Our implementation does not scale to large, unknown environments -- nor would we expect it to, since it generalizes POMDP value iteration, which is difficult to scale. However, we believe that future work will allow us to extend our ideas to approximate reasoning in large, unknown environments.
Publisher OA PDF DOI
Successor Feature Sets: Generalizing Successor Representations Across\n Policies
arXiv (Cornell University) · 2021-03-03
preprintOpen accessSenior author
Successor-style representations have many advantages for reinforcement\nlearning: for example, they can help an agent generalize from past experience\nto new goals, and they have been proposed as explanations of behavioral and\nneural data from human and animal learners. They also form a natural bridge\nbetween model-based and model-free RL methods: like the former they make\npredictions about future experiences, and like the latter they allow efficient\nprediction of total discounted rewards. However, successor-style\nrepresentations are not optimized to generalize across policies: typically, we\nmaintain a limited-length list of policies, and share information among them by\nrepresentation learning or GPI. Successor-style representations also typically\nmake no provision for gathering information or reasoning about latent\nvariables. To address these limitations, we bring together ideas from\npredictive state representations, belief space value iteration, successor\nfeatures, and convex analysis: we develop a new, general successor-style\nrepresentation, together with a Bellman equation that connects multiple sources\nof information within this representation, including different latent states,\npolicies, and reward functions. The new representation is highly expressive:\nfor example, it lets us efficiently read off an optimal policy for a new reward\nfunction, or a policy that imitates a new demonstration. For this paper, we\nfocus on exact computation of the new representation in small, known\nenvironments, since even this restricted setting offers plenty of interesting\nquestions. Our implementation does not scale to large, unknown environments --\nnor would we expect it to, since it generalizes POMDP value iteration, which is\ndifficult to scale. However, we believe that future work will allow us to\nextend our ideas to approximate reasoning in large, unknown environments.\n
Publisher OA PDF DOI
Learning General Latent-Variable Graphical Models with Predictive Belief Propagation
Proceedings of the AAAI Conference on Artificial Intelligence · 2020-04-03
articleOpen accessSenior author
Learning general latent-variable probabilistic graphical models is a key theoretical challenge in machine learning and artificial intelligence. All previous methods, including the EM algorithm and the spectral algorithms, face severe limitations that largely restrict their applicability and affect their performance. In order to overcome these limitations, in this paper we introduce a novel formulation of message-passing inference over junction trees named predictive belief propagation, and propose a new learning and inference algorithm for general latent-variable graphical models based on this formulation. Our proposed algorithm reduces the hard parameter learning problem into a sequence of supervised learning problems, and unifies the learning of different kinds of latent graphical models into a single learning framework, which is local-optima-free and statistically consistent. We then give a proof of the correctness of our algorithm and show in experiments on both synthetic and real datasets that our algorithm significantly outperforms both the EM algorithm and the spectral algorithm while also being orders of magnitude faster to compute.
Publisher OA PDF DOI
Information Obfuscation of Graph Neural Networks
arXiv (Cornell University) · 2020 · 8 citations
- Computer Science
- Computer Science
- Artificial Intelligence
While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance. Our method creates a strong defense against inference attacks, while only suffering small loss in task performance. Theoretically, we analyze the effectiveness of our framework against a worst-case adversary, and characterize an inherent trade-off between maximizing predictive accuracy and minimizing information leakage. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders for downstream tasks.
Publisher OA PDF DOI
An Empirical Investigation of Beam-Aware Training in Supertagging
2020-01-01
preprintOpen accessSenior author
Structured prediction is often approached by training a locally normalized model with maximum likelihood and decoding approximately with beam search. This approach leads to mismatches as, during training, the model is not exposed to its mistakes and does not use beam search. Beam-aware training aims to address these problems, but unfortunately, it is not yet widely used due to a lack of understanding about how it impacts performance, when it is most useful, and whether it is stable. Recently, Negrinho et al. ( In this paper, we begin an empirical investigation: we train the supertagging model of Vaswani et al. ( We explore the influence of various design choices and make recommendations for choosing them. We observe that beam-aware training improves performance for both models, with large improvements for the simpler model which must effectively manage uncertainty during decoding. Our results suggest that a model must be learned with search to maximize its effectiveness.
Publisher OA PDF DOI

Frequent coauthors

Byron Boots
35 shared
Han Zhao
27 shared
Sebastian Thrun
17 shared
Ahmed Hefny
14 shared
Sajid M. Siddiqi
13 shared
David Yaron
Carnegie Mellon University
10 shared
J. Andrew Bagnell
10 shared
Carlton Downey
9 shared

Labs

Wellness NetworkPI

Education

Ph.D., Computer Science
Carnegie Mellon University
Other
AUTON lab
Other
Carnegie Mellon University

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Geoffrey J Gordon

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you