
Armando Solar-Lezama
VerifiedMassachusetts Institute of Technology · Electrical Engineering & Computer Science
Active 2005–2026
About
Armando Solar-Lezama is a Professor of Computing at MIT Schwarzman College of Computing and a faculty member of the EECS department. He holds the title of Distinguished Professor of Computing and is involved in research areas including Programming Languages and Software Engineering, and Systems and Networking. His work focuses on developing innovative computational systems, with particular emphasis on the analysis and synthesis of systems that interact with the external world through perception, communication, and action, while also learning, making decisions, and adapting to changing environments. As a leader in the field, he contributes to advancing the understanding and development of intelligent systems that address complex challenges in computing and engineering.
Research topics
- Computer Science
- Artificial Intelligence
- Programming language
- Machine Learning
- Computational biology
- Data science
- Theoretical computer science
- Biology
Selected publications
Linear temporal constraints for sketch-based synthesizers
Formal Methods in System Design · 2026-05-12
articleOpen accessSenior authorSketch-based program synthesis allows users to guide the synthesizer by writing partial programs (sketches). Traditionally, the specifications of these sketches are safety properties expressed either as assertions or semantic equivalences. These specifications, however, lack expressiveness when the user wants to establish how the execution of the desired program evolves over time. This is especially important when synthesizing reactive programs, where the whole specification is about how the computation evolves over time. We explore an alternative method letting the user specify desired program executions as linear temporal logic (LTL) formulae. We define a method for transforming sketches with LTL assertions into sketches with only standard assertions. Specifically, for terminating programs, our method transforms and implements, within the sketch, the LTL formulae as runtime monitors. For non-terminating programs, our procedure defines and implements, into the sketch, fairness conditions (analyzing when an acceptance state of the Büchi automata equivalent to these LTL formulae occurs infinitely often). We prove the correctness of both constructions. Evaluation of our implementation in Sketch shows that our method enables the system to synthesize non-terminating programs, such as a round-robin arbiter for a variable number of devices and a lift controller for a variable number of floors. For terminating programs, our approach improves the synthesizer performance by restricting the search of program candidates without modifying the sketches structures.
VLMaterial: Procedural Material Generation with Large Vision-Language Models
ArXiv.org · 2025-01-27 · 1 citations
preprintOpen accessProcedural materials, represented as functional node graphs, are ubiquitous in computer graphics for photorealistic material appearance design. They allow users to perform intuitive and precise editing to achieve desired visual appearances. However, creating a procedural material given an input image requires professional knowledge and significant effort. In this work, we leverage the ability to convert procedural materials into standard Python programs and fine-tune a large pre-trained vision-language model (VLM) to generate such programs from input images. To enable effective fine-tuning, we also contribute an open-source procedural material dataset and propose to perform program-level augmentation by prompting another pre-trained large language model (LLM). Through extensive evaluation, we show that our method outperforms previous methods on both synthetic and real-world examples.
Stochastic Lazy Knowledge Compilation for Inference in Discrete Probabilistic Programs
Proceedings of the ACM on Programming Languages · 2025-06-10 · 1 citations
articleOpen accessWe present new techniques for exact and approximate inference in discrete probabilistic programs, based on two new ways of exploiting lazy evaluation. First, we show how knowledge compilation, a state-of-the art technique for exact inference in discrete probabilistic programs, can be made lazy, enabling asymptotic speed-ups. Second, we show how a probabilistic program’s lazy semantics naturally give rise to a division of its random choices into subproblems, which can be solved in sequence by sequential Monte Carlo with locally-optimal proposals automatically computed via lazy knowledge compilation. We implement our approach in a new tool, Pluck, and evaluate its performance against state-of-the-art approaches to inference in discrete probabilistic languages. We find that on a suite of inference benchmarks, lazy knowledge compilation can be faster than state-of-the-art approaches, sometimes by orders of magnitude.
Challenges and Paths Towards AI for Software Engineering
ArXiv.org · 2025-03-28
preprintOpen accessSenior authorAI for software engineering has made remarkable progress recently, becoming a notable success within generative AI. Despite this, there are still many challenges that need to be addressed before automated software engineering reaches its full potential. It should be possible to reach high levels of automation where humans can focus on the critical decisions of what to build and how to balance difficult tradeoffs while most routine development effort is automated away. Reaching this level of automation will require substantial research and engineering efforts across academia and industry. In this paper, we aim to discuss progress towards this in a threefold manner. First, we provide a structured taxonomy of concrete tasks in AI for software engineering, emphasizing the many other tasks in software engineering beyond code generation and completion. Second, we outline several key bottlenecks that limit current approaches. Finally, we provide an opinionated list of promising research directions toward making progress on these bottlenecks, hoping to inspire future research in this rapidly maturing field.
Challenges and Paths Towards AI for Software Engineering
Qeios · 2025-04-04 · 5 citations
preprintOpen accessSenior authorAI for software engineering has made remarkable progress recently, becoming a notable success within generative AI. Despite this, there are still many challenges that need to be addressed before automated software engineering reaches its full potential. It should be possible to reach high levels of automation where humans can focus on the critical decisions of what to build and how to balance difficult tradeoffs while most routine development effort is automated away. Reaching this level of automation will require substantial research and engineering efforts across academia and industry. In this paper, we aim to discuss progress towards this in a threefold manner. First, we provide a structured taxonomy of concrete tasks in AI for software engineering, emphasizing the many other tasks in software engineering beyond code generation and completion. Second, we outline several key bottlenecks that limit current approaches. Finally, we provide an opinionated list of promising research directions toward making progress on these bottlenecks, hoping to inspire future research in this rapidly maturing field.
Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning
arXiv (Cornell University) · 2025-01-06
preprintOpen accessSenior authorWhile LLMs have revolutionized the field of machine learning due to their high performance on a strikingly wide range of problems, they are also known to hallucinate false answers and underperform on less canonical versions of the same tasks. There are several emerging theories of LLM performance, among them that LLMs lack world modeling ability, that they have an undesirable bias towards an autoregressive prior, and that they struggle on more novel problems. The existing literature on LLM input novelty has focused on tasks of relatively high complexity, studying perturbations of canonical but complex problems. In this paper, we attempt to minimize complexity in order to isolate novelty as a factor in LLM underperformance and investigate the power of in-context-learning. To this end, we consider an extremely simple domain: next token prediction on simple language tasks. The twist is that these language tasks are wholly unseen, as they are randomly drawn from a large, parsimoniously defined set of languages arising from simple grammar rules. This experimental setup allows us to evaluate ICL independently of models' parametric knowledge. We find that LLMs uniformly underperform n-gram models on this task, both when used as next token predictors and in chain-of-thought.
EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths
ArXiv.org · 2025-12-03
preprintOpen accessWe introduce a new approach to agent programming, the development of LLM-based agents. Current approaches to agent programming often entangle two aspects of agent design: the core workflow logic and the inference-time strategy (e.g., tree search). We introduce "probabilistic angelic nondeterminism" ("PAN"), a programming model that disentangles these two concerns, allowing the programmer to describe the agent workflow and independently experiment with different inference-time strategies by simply changing a few inputs. We provide an implementation of PAN in Python as the EnCompass framework, which uses a Python decorator to compile agent workflow programs into a search space. We present three case studies that demonstrate how the framework lets the programmer quickly improve the reliability of an agent and easily switch between different inference-time strategies, all with little additional coding.
Peepco: Batch-Based Consistency Optimization
Proceedings of the ACM on Programming Languages · 2025-04-09
articleOpen accessSenior authorWe present batch-based consistency, a new approach for consistency optimization that allows programmers to specialize consistency with application-level integrity properties. We implement the approach with a two-step process: we statically infer optimal consistency requirements for executions of bounded sets of operations, and then, use the inferred requirements to parameterize a new distributed protocol to relax operation reordering at run time when it is safe to do so. Our approach supports standard notions of consistency. We implement batch-based consistency in Peepco, demonstrate its expressiveness for partial data replication, and examine Peepco’s run-time performance impact in different settings.
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
ArXiv.org · 2025-02-23
preprintOpen accessAs AI becomes more closely integrated with peoples' daily activities, socially intelligent AI that can understand and interact seamlessly with humans in daily lives is increasingly important. However, current works in AI social reasoning all rely on language-only or language-dominant approaches to benchmark and training models, resulting in systems that are improving in verbal communication but struggle with nonverbal social understanding. To address this limitation, we tap into a novel data source rich in nonverbal social interactions -- mime videos. Mimes refer to the art of expression through gesture and movement without spoken words, which presents unique challenges and opportunities in interpreting nonverbal social communication. We contribute a new dataset called MimeQA, obtained by sourcing ~8 hours of videos clips from YouTube and developing a comprehensive video question-answering benchmark comprising 806 carefully annotated and verified question-answer pairs, designed to probe nonverbal social reasoning capabilities. Using MimeQA, we evaluate state-of-the-art video large language models (VideoLLMs) and find that they achieve low accuracy, generally ranging from 20-30%, while humans score 86%. Our analysis reveals that VideoLLMs often fail to ground imagined objects and over-rely on the text prompt while ignoring subtle nonverbal interactions. We hope to inspire future work in AI models that embody true social intelligence capable of interpreting non-verbal human interactions.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
arXiv (Cornell University) · 2024-03-12 · 23 citations
preprintOpen accessLarge Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation benchmarks (e.g., HumanEval, MBPP) are no longer sufficient for assessing their capabilities. In this work, we propose LiveCodeBench, a comprehensive and contamination-free evaluation of LLMs for code, which continuously collects new problems over time from contests across three competition platforms, namely LeetCode, AtCoder, and CodeForces. Notably, our benchmark also focuses on a broader range of code related capabilities, such as self-repair, code execution, and test output prediction, beyond just code generation. Currently, LiveCodeBench hosts four hundred high-quality coding problems that were published between May 2023 and May 2024. We have evaluated 18 base LLMs and 34 instruction-tuned LLMs on LiveCodeBench. We present empirical findings on contamination, holistic performance comparisons, potential overfitting in existing benchmarks as well as individual model comparisons. We will release all prompts and model completions for further community analysis, along with a general toolkit for adding new scenarios and model
Recent grants
NSF · $232k · 2017–2019
SHF: Medium: Collaborative Research: Marrying program analysis and numerical search
NSF · $600k · 2012–2016
SHF: Small: Human-Centered Software Synthesis
NSF · $405k · 2011–2015
Expeditions: Collaborative Research: Understanding the World Through Code
NSF · $5.7M · 2020–2027
NSF · $500k · 2012–2017
Frequent coauthors
- 44 shared
Rishabh Singh
Texas A&M University
- 40 shared
Rastislav Bodík
Google (United States)
- 31 shared
Rajeev Alur
University of Pennsylvania
- 29 shared
Yewen Pu
- 29 shared
Dana Fisman
Yale University
- 28 shared
Sanjit A. Seshia
- 28 shared
Joshua B. Tenenbaum
Massachusetts Institute of Technology
- 25 shared
Liviu Tancau
University of California, Berkeley
Labs
MIT EECS Artificial Intelligence + Decision-makingPI
Awards & honors
- Inaugural Distinguished Professor of Computing (2023)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Armando Solar-Lezama
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup