Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Guy Van Den Broeck

Guy Van Den Broeck

· ProfessorVerified

University of California, Los Angeles · Computer Science

Active 2009–2026

h-index32
Citations3.6k
Papers288120 last 5y
Funding$1.5M
See your match with Guy Van Den Broeck — sign in to PhdFit.Sign in

About

Guy Van den Broeck is a professor in the Department of Computer Science at UCLA Samueli School of Engineering. His research focuses on machine learning, knowledge representation and reasoning, and applications of probabilistic reasoning and learning within artificial intelligence. He has received numerous awards for his contributions, including the NSF CAREER Award, the IJCAI 2019 Computers and Thought Award, and the Research Council Award from KU Leuven. His work has been recognized for advancing the understanding and development of AI models capable of reasoning and learning, with recent media coverage highlighting trends in reasoning AI models and their applications in designing medical websites.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Theoretical computer science
  • Algorithm
  • Mathematics
  • Quantum mechanics
  • Physics
  • Discrete mathematics
  • Statistics

Selected publications

  • Enabling Autoregressive Models to Fill In Masked Tokens

    Underline Science Inc. · 2026-03-06

    otherOpen access

    Historically, LLMs have been trained using either autoregressive (AR) or masked language modeling (MLM) objectives, with AR models gaining dominance in recent years. However, AR models are inherently incapable of masked infilling, which is the ability to predict masked tokens between past and future context. In contrast, MLM models suffer from intrinsic computational inefficiencies during both training and inference that hinder their scalability. This work introduces MARIA (Masked and Autoregressive Infilling Architecture), a novel approach that leverages the strengths of both paradigms to achieve state-of-the-art masked infilling performance. MARIA combines a pre-trained MLM and AR model by training a linear decoder that takes their concatenated hidden states as input. This minimal modification enables the AR model to perform infilling while retaining its inherent advantages in terms of faster inference with KV caching. Our results demonstrate that MARIA significantly outperforms existing methods, namely discrete diffusion models, on masked infilling tasks.

  • A canonical generalization of OBDD

    arXiv (Cornell University) · 2026-04-07

    articleOpen accessSenior author

    We introduce Tree Decision Diagrams (TDD) as a model for Boolean functions that generalizes OBDD. They can be seen as a restriction of structured d-DNNF; that is, d-DNNF that respect a vtree $T$. We show that TDDs enjoy the same tractability properties as OBDD, such as model counting, enumeration, conditioning, and apply, and are more succinct. In particular, we show that CNF formulas of treewidth $k$ can be represented by TDDs of FPT size, which is known to be impossible for OBDD. We study the complexity of compiling CNF formulas into deterministic TDDs via bottom-up compilation and relate the complexity of this approach with the notion of factor width introduced by Bova and Szeider.

  • Enabling Autoregressive Models to Fill In Masked Tokens

    2026-01-01

    articleSenior author
  • ESTroM: Element-Flow Architecture for Processing Sparse Tractable Probabilistic Models

    2026-01-31

    article

    Probabilistic Circuits (PCs) models are emerging popular tractable probabilistic models. Their internal connections are represented in the form of directed acyclic graphs (DAGs) with sum nodes and product nodes, ensuring their internal parameter efficiency and model expressiveness in terms of probabilistic inference. Despite these algorithmic advantages, executing PC still faces graph structure deployment issues. PyJuice on GPU with the block-sparse parallel computation methods causes a parallelism-sparsity gap, while DAG-style processing does not take advantage of the repetitive characteristics of PC internal nodes, resulting in low throughput. To address this challenge, this work proposes the ESTroM, an efficient architecture that provides novel graph-element (nodes/edges) parallelism with sparsity-aware compilation. Through analysis of the sum/product node computational requirements, ESTroM core uses compressed matrices for sum/product nodes DAG representations, edge-based dataflow for product node processing, and node-based dataflow for sum node processing. With intra-core rewind and intercore multicast optimizations, we develop a prototype ESTrom chip and a demonstrative system for a PC-based neural lossless compression application. Our ablation experiments show ESTrom offers a speed improvement of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$2.11 \sim 3.79 \times$</tex> compared to the state-of-the-art DAG processing unit (DPU)-v2 with the same computing resources. Under various typical PC structures, ESTrom achieves a speedup of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$18.7 \times$</tex> compared to DPU-v2 and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$3.9 \times$</tex> compared to NVIDIA RTX 4090 GPU with PyJuice framework. In terms of neural lossless compression, ESTroM demonstrates a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$1.39 \times$</tex> improvement in compression ratio compared to the industrial-standard Z-standard (Zstd) algorithms with the highest compression level, while offering <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$16.3 \sim 65.2 \times$</tex> improvement in compression speed compared to Zstd on Intel Xeon Gold 6230. In a nutshell, this work develops novel graph element parallelism and element-flow architecture theory with practical prototype chips and systems, revealing a new hardware-perspective path for the “scaling law” of emerging tractable probabilistic models.

  • Controllable Generation via Locally Constrained Resampling

    2026-03-20

    otherSenior author
  • A canonical generalization of OBDD

    arXiv (Cornell University) · 2026-04-07

    preprintOpen accessSenior author

    We introduce Tree Decision Diagrams (TDD) as a model for Boolean functions that generalizes OBDD. They can be seen as a restriction of structured d-DNNF; that is, d-DNNF that respect a vtree $T$. We show that TDDs enjoy the same tractability properties as OBDD, such as model counting, enumeration, conditioning, and apply, and are more succinct. In particular, we show that CNF formulas of treewidth $k$ can be represented by TDDs of FPT size, which is known to be impossible for OBDD. We study the complexity of compiling CNF formulas into deterministic TDDs via bottom-up compilation and relate the complexity of this approach with the notion of factor width introduced by Bova and Szeider.

  • Probabilistic Programs of Thought

    arXiv (Cornell University) · 2026-04-19

    preprintOpen accessSenior author

    LLMs are widely used for code generation and mathematical reasoning tasks where they are required to generate structured output. They either need to reason about code, generate code for a given specification, or reason using programs of thought. The typical approach to code generation is to prompt the model and generate samples until an appropriate program is obtained. Within this process, sampling $n$ programs from the language model requires $n$ GPU compute-intensive generations which becomes prohibitively expensive for larger values of $n$. In this work, we address this limitation by exposing the LLM's distribution within the generated programs themselves. We propose a novel test-time framework we dub probabilistic programs of thought to obtain more samples from the model with fewer LLM generations. Given a program generated by a model and the associated next-token probabilities, we build a probabilistic program that compactly represents exponentially many deterministic programs. Since performing probabilistic reasoning in this probabilistic program is much cheaper, our approach allows sampling new programs without any additional GPU compute and little CPU overhead. We instantiate our approach on benchmarks for code generation, code understanding and mathematical reasoning and report improvements in performance with fewer generations from the LLM.

  • Probabilistic Programs of Thought

    ArXiv.org · 2026-04-19

    articleOpen accessSenior author

    LLMs are widely used for code generation and mathematical reasoning tasks where they are required to generate structured output. They either need to reason about code, generate code for a given specification, or reason using programs of thought. The typical approach to code generation is to prompt the model and generate samples until an appropriate program is obtained. Within this process, sampling $n$ programs from the language model requires $n$ GPU compute-intensive generations which becomes prohibitively expensive for larger values of $n$. In this work, we address this limitation by exposing the LLM's distribution within the generated programs themselves. We propose a novel test-time framework we dub probabilistic programs of thought to obtain more samples from the model with fewer LLM generations. Given a program generated by a model and the associated next-token probabilities, we build a probabilistic program that compactly represents exponentially many deterministic programs. Since performing probabilistic reasoning in this probabilistic program is much cheaper, our approach allows sampling new programs without any additional GPU compute and little CPU overhead. We instantiate our approach on benchmarks for code generation, code understanding and mathematical reasoning and report improvements in performance with fewer generations from the LLM.

  • Adversarial Tokenization

    ArXiv.org · 2025-03-04

    preprintOpen accessSenior author

    Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the standard Llama3 tokenization of penguin is [p,enguin], yet [peng,uin] is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.

  • TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation

    ArXiv.org · 2025-04-25

    preprintOpen accessSenior author

    As large language models (LMs) advance, there is an increasing need to control their outputs to align with human values (e.g., detoxification) or desired attributes (e.g., personalization, topic). However, autoregressive models focus on next-token predictions and struggle with global properties that require looking ahead. Existing solutions either post-train LMs for each new attribute--expensive and inflexible--or approximate the Expected Attribute Probability (EAP) of future sequences by sampling or training, which is slow and unreliable for rare attributes. We introduce TRACE (Tractable Probabilistic Reasoning for Adaptable Controllable gEneration), a novel framework that efficiently computes EAP and adapts to new attributes through tractable probabilistic reasoning and lightweight control. TRACE distills a Hidden Markov Model (HMM) from an LM and pairs it with a small classifier to estimate attribute probabilities, enabling exact EAP computation over the HMM's predicted futures. This EAP is then used to reweigh the LM's next-token probabilities for globally compliant continuations. Empirically, TRACE achieves state-of-the-art detoxification results with only 20% decoding overhead, yields 76 low-resource personalized LMs within seconds, and seamlessly extends to composite attributes. Our code is available at: https://github.com/yidouweng/trace.

Recent grants

Frequent coauthors

Awards & honors

  • Intel Outstanding Researcher Award
  • NSF CAREER Award
  • IJCAI 2019 Computers and Thought Award
  • Research Council Award, KU Leuven
  • LLD 2017
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Guy Van Den Broeck

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup