Luca Carlone

· Associate Professor

Massachusetts Institute of Technology · Aeronautics & Astronautics

Active 2008–2026

h-index60

Citations19.4k

Papers383223 last 5y

Funding$580k1 active

Faculty page Lab page

See your match with Luca Carlone — sign in to PhdFit.Sign in

About

Luca Carlone is a professor at the Massachusetts Institute of Technology in the Department of Aeronautics and Astronautics. His research interests include nonlinear estimation, numerical and distributed optimization, learning and probabilistic inference, applied to sensing, perception, and decision making in single and multi-robot systems. He specializes in the design of certifiable perception algorithms for high-integrity autonomous systems and the development of algorithms and systems for real-time 3D scene understanding on mobile robotics platforms operating in the real world. He holds a Ph.D. from the Polytechnic University of Turin, earned in 2012, and has completed additional master's degrees from the Polytechnic University of Turin and Milan, both with highest honors. His academic background also includes a B.S. from the Polytechnic University of Turin. Carlone has held various positions at MIT, including Boeing Career Development Associate Professor, Sloan Research Fellow, and Raymond L. Bisplinghoff Faculty Fellow. His previous roles include postdoctoral fellow at Georgia Tech and visiting scholar positions at the University of California Santa Barbara and the University of Zaragoza. He is a senior member of the IEEE and an associate fellow of the AIAA. His contributions have been recognized with numerous awards, including the Outstanding Systems Paper Award at RSS 2024, the IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award in 2023, and the AIAA Aeronautics and Astronautics Advising Award in 2022. His work has significantly advanced the fields of perception, estimation, and autonomous systems, with a focus on certifiable algorithms and scalable solutions for real-world robotic applications.

Research topics

Computer Science
Artificial Intelligence
Human–computer interaction
Machine Learning
Mathematical optimization
Algorithm
Mathematics
Geography
Engineering
Systems engineering
Archaeology

Selected publications

Mixed Diffusion for 3D Indoor Scene Synthesis
2026-03-06
preprintOpen access
Generating realistic 3D scenes is an area of growing interest in computer vision and robotics. However, creating high-quality, diverse synthetic 3D content often requires expert intervention, making it costly and complex. Recently, efforts to automate this process with learning techniques, particularly diffusion models, have shown significant improvements in tasks like furniture rearrangement. However, applying diffusion models to floor-conditioned indoor scene synthesis remains under-explored. This task is especially challenging as it requires arranging objects in continuous space while selecting from discrete object categories, posing unique difficulties for conventional diffusion methods. To bridge this gap, we present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes given a floor plan and pre-arranged objects. We represent a scene layout by a 2D floor plan and a set of objects, each defined by category, location, size, and orientation. Our approach uniquely applies structured corruption across mixed discrete semantic and continuous geometric domains, resulting in a better-conditioned problem for denoising. Evaluated on the 3D-FRONT dataset, MiDiffusion outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis. Additionally, it effectively handles partial object constraints via a corruption-and-masking strategy without task-specific training, demonstrating advantages in scene completion and furniture arrangement tasks.
Publisher OA PDF DOI
Towards Zero-Shot Point Cloud Registration Across Diverse Scales, Scenes, and Sensor Setups
arXiv (Cornell University) · 2026-01-06
preprintOpen access
Some deep learning-based point cloud registration methods struggle with zero-shot generalization, often requiring dataset-specific hyperparameter tuning or retraining for new environments. We identify three critical limitations: (a) fixed user-defined parameters (e.g., voxel size, search radius) that fail to generalize across varying scales, (b) learned keypoint detectors exhibit poor cross-domain transferability, and (c) absolute coordinates amplify scale mismatches between datasets. To address these three issues, we present BUFFER-X, a training-free registration framework that achieves zero-shot generalization through: (a) geometric bootstrapping for automatic hyperparameter estimation, (b) distribution-aware farthest point sampling to replace learned detectors, and (c) patch-level coordinate normalization to ensure scale consistency. Our approach employs hierarchical multi-scale matching to extract correspondences across local, middle, and global receptive fields, enabling robust registration in diverse environments. For efficiency-critical applications, we introduce BUFFER-X-Lite, which reduces total computation time by 43% (relative to BUFFER-X) through early exit strategies and fast pose solvers while preserving accuracy. We evaluate on a comprehensive benchmark comprising 12 datasets spanning object-scale, indoor, and outdoor scenes, including cross-sensor registration between heterogeneous LiDAR configurations. Results demonstrate that our approach generalizes effectively without manual tuning or prior knowledge of test domains. Code: https://github.com/MIT-SPARK/BUFFER-X.
Publisher DOI
Towards Zero-Shot Point Cloud Registration Across Diverse Scales, Scenes, and Sensor Setups
ArXiv.org · 2026-01-06
articleOpen access
Some deep learning-based point cloud registration methods struggle with zero-shot generalization, often requiring dataset-specific hyperparameter tuning or retraining for new environments. We identify three critical limitations: (a) fixed user-defined parameters (e.g., voxel size, search radius) that fail to generalize across varying scales, (b) learned keypoint detectors exhibit poor cross-domain transferability, and (c) absolute coordinates amplify scale mismatches between datasets. To address these three issues, we present BUFFER-X, a training-free registration framework that achieves zero-shot generalization through: (a) geometric bootstrapping for automatic hyperparameter estimation, (b) distribution-aware farthest point sampling to replace learned detectors, and (c) patch-level coordinate normalization to ensure scale consistency. Our approach employs hierarchical multi-scale matching to extract correspondences across local, middle, and global receptive fields, enabling robust registration in diverse environments. For efficiency-critical applications, we introduce BUFFER-X-Lite, which reduces total computation time by 43% (relative to BUFFER-X) through early exit strategies and fast pose solvers while preserving accuracy. We evaluate on a comprehensive benchmark comprising 12 datasets spanning object-scale, indoor, and outdoor scenes, including cross-sensor registration between heterogeneous LiDAR configurations. Results demonstrate that our approach generalizes effectively without manual tuning or prior knowledge of test domains. Code: https://github.com/MIT-SPARK/BUFFER-X.
Publisher OA PDF
ASHiTA: Automatic Scene-Grounded HIerarchical Task Analysis
2025-06-10 · 1 citations
article
While recent work in scene reconstruction and understanding has made strides in grounding natural language to physical 3D environments, it is still challenging to ground abstract, high-level instructions to a 3D scene. High-Level instructions might not explicitly invoke semantic elements in the scene, and even the process of breaking a high-level task into a set of more concrete subtasks —a process called hierarchical task analysis— is environment-dependent. In this work, we propose ASHiTA, the first framework that generates a task hierarchy grounded to a 3D scene graph by breaking down high-level tasks into grounded subtasks. ASHiTA alternates LLM-assisted hierarchical task analysis —to generate the task breakdown— with task-driven 3D scene graph construction to generate a suitable representation of the environment. Our experiments show that ASHiTA performs significantly better than LLM baselines in breaking down high-level tasks into environment-dependent subtasks and is additionally able to achieve grounding performance comparable to state-of-the-art methods.
Publisher DOI
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
2025-06-10 · 1 citations
articleSenior author
We consider the problem of estimating object pose and shape from an RGB-D image. Our first contribution is to introduce CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder model for shape estimation. It uses FiLM-conditioning for implicit shape reconstruction and a DPT-based network for estimating pose-normalized points for pose estimation. As a second contribution, we propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap. Observing that the shape decoder is well behaved in the convex hull of known shapes, we approximate the shape decoder with an active shape model, and show that this reduces the shape correction problem to a constrained linear least squares problem, which can be solved efficiently by an interior point algorithm. Third, we introduce a self-training pipeline to perform self-supervised domain adaptation of CRISP. The self-training is based on a correct-and-certify approach, which leverages the corrector to generate pseudo-labels at test time, and uses them to self-train CRISP. We demonstrate CRISP (and the self-training) on YCBV, SPE3R, and NOCS datasets. CRISP shows high performance on all the datasets. Moreover, our self-training is capable of bridging a large domain gap. Finally, CRISP also shows an ability to generalize to unseen objects. Code, pre-trained models and videos of sample results are available on the project webpage. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
Publisher DOI
Advancing AI Challenges for the United States Department of the Air Force<sup>*</sup>
2025-09-15
article
The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundamental advances in artificial intelligence (AI) to expand the competitive advantage of the United States in the defense and civilian sectors. In recent years, AI Accelerator projects have developed and launched public challenge problems aimed at advancing AI research in priority areas. Hallmarks of AI Accelerator challenges include large, publicly available, and AI-ready datasets to stimulate open-source solutions and engage the wider academic and private sector AI ecosystem. This article supplements our previous publication, which introduced AI Accelerator challenges. We provide an update on how ongoing and new challenges have successfully contributed to AI research and applications of AI technologies.
Publisher DOI
KISS-Matcher: Fast and Robust Point Cloud Registration Revisited
2025-05-19 · 7 citations
articleSenior author
While global point cloud registration systems have advanced significantly in all aspects, many studies have focused on specific components, such as feature extraction, graph-theoretic pruning, or pose solvers. In this paper, we take a holistic view on the registration problem and develop an open-source and versatile C++ library for point cloud registration, called KISS-Matcher. KISS-Matcher combines a novel feature detector, Faster-PFH, that improves over the classical fast point feature histogram (FPFH). Moreover, it adopts a k-core-based graph-theoretic pruning to reduce the time complexity of rejecting outlier correspondences. Finally, it combines these modules in a complete, user-friendly, and ready-to-use pipeline. As verified by extensive experiments, KISS-Matcher has superior scalability and broad applicability, achieving a substantial speed-up compared to state-of-the-art outlier-robust registration pipelines while preserving accuracy. Our code will be available at https://github.com/MIT-SPARK/KISS-Matcher.
Publisher DOI
Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs
ArXiv.org · 2025-06-09
preprintOpen access
In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, view-invariant relocalization (via the object-based map) and planning (via the 3D scene graph), allowing a team of robots to reason about their surroundings and execute complex tasks. Additionally, we introduce a planning approach that translates operator intent into Planning Domain Definition Language (PDDL) goals using a Large Language Model (LLM) by leveraging context from the shared 3D scene graph and robot capabilities. We provide an experimental assessment of the performance of our system on real-world tasks in large-scale, outdoor environments. A supplementary video is available at https://youtu.be/8xbGGOLfLAY.
Publisher OA PDF DOI
Structured Interfaces for Automated Reasoning with 3D Scene Graphs
ArXiv.org · 2025-10-18
preprintOpen access
In order to provide a robot with the ability to understand and react to a user's natural language inputs, the natural language must be connected to the robot's underlying representations of the world. Recently, large language models (LLMs) and 3D scene graphs (3DSGs) have become a popular choice for grounding natural language and representing the world. In this work, we address the challenge of using LLMs with 3DSGs to ground natural language. Existing methods encode the scene graph as serialized text within the LLM's context window, but this encoding does not scale to large or rich 3DSGs. Instead, we propose to use a form of Retrieval Augmented Generation to select a subset of the 3DSG relevant to the task. We encode a 3DSG in a graph database and provide a query language interface (Cypher) as a tool to the LLM with which it can retrieve relevant data for language grounding. We evaluate our approach on instruction following and scene question-answering tasks and compare against baseline context window and code generation methods. Our results show that using Cypher as an interface to 3D scene graphs scales significantly better to large, rich graphs on both local and cloud-based models. This leads to large performance improvements in grounded language tasks while also substantially reducing the token count of the scene graph content. A video supplement is available at https://www.youtube.com/watch?v=zY_YI9giZSA.
Publisher OA PDF DOI
Advancing AI Challenges for the United States Department of the Air Force
ArXiv.org · 2025-10-31
preprintOpen access
The DAF-MIT AI Accelerator is a collaboration between the United States Department of the Air Force (DAF) and the Massachusetts Institute of Technology (MIT). This program pioneers fundamental advances in artificial intelligence (AI) to expand the competitive advantage of the United States in the defense and civilian sectors. In recent years, AI Accelerator projects have developed and launched public challenge problems aimed at advancing AI research in priority areas. Hallmarks of AI Accelerator challenges include large, publicly available, and AI-ready datasets to stimulate open-source solutions and engage the wider academic and private sector AI ecosystem. This article supplements our previous publication, which introduced AI Accelerator challenges. We provide an update on how ongoing and new challenges have successfully contributed to AI research and applications of AI technologies.
Publisher OA PDF DOI

Recent grants

CAREER: Certifiable Perception for Autonomous Cyber-Physical Systems
NSF · $580k · 2021–2026

Frequent coauthors

Yun Chang
88 shared
Sertaç Karaman
Massachusetts Institute of Technology
81 shared
Jingnan Shi
76 shared
Heng Yang
67 shared
Antoni Rosinol
Stanford University
49 shared
Nathan Hughes
44 shared
Allan Axelrod
University of Pittsburgh
42 shared
Frank Dellaert
39 shared

Labs

Human-System CollaborationPI

Awards & honors

Outstanding Systems Paper Award at the Robotics: Science and…
IEEE Transactions on Robotics King-Sun Fu Memorial Best Pape…
AIAA Aeronautics and Astronautics Advising Award (2022)
Best Student Paper Award at the IEEE/RSJ International Confe…
Outstanding Associate Editor Award, International Conference…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Luca Carlone

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you