Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Achuta Kadambi

Achuta Kadambi

· ProfessorVerified

University of California, Los Angeles · Computer Science

Active 2013–2026

h-index18
Citations2.1k
Papers10361 last 5y
Funding$715k1 active
See your match with Achuta Kadambi — sign in to PhdFit.Sign in

About

Achuta Kadambi is a leader of the Visual Machines Group and an Associate Professor at UCLA in the Department of Electrical Engineering and Computer Science. His academic background includes a PhD from the Massachusetts Institute of Technology. His research focuses on visual machines, exploring innovative approaches in computer vision and related fields. As a faculty member, he contributes to advancing the understanding and development of visual perception technologies, guiding graduate and postdoctoral researchers, and engaging in collaborative scientific efforts.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Computer vision
  • Algorithm
  • Psychology
  • Data science
  • Internet privacy

Selected publications

  • WorldBench: Disambiguating Physics for Diagnostic Evaluation of World Models

    ArXiv.org · 2026-01-29

    articleOpen accessSenior author

    Recent advances in generative foundational models, often termed "world models," have propelled interest in applying them to critical tasks like robotic planning and autonomous system training. For reliable deployment, these models must exhibit high physical fidelity, accurately simulating real-world dynamics. Existing physics-based video benchmarks, however, suffer from entanglement, where a single test simultaneously evaluates multiple physical laws and concepts, fundamentally limiting their diagnostic capability. We introduce WorldBench, a novel video-based benchmark specifically designed for concept-specific, disentangled evaluation, allowing us to rigorously isolate and assess understanding of a single physical concept or law at a time. To make WorldBench comprehensive, we design benchmarks at two different levels: 1) an evaluation of intuitive physical understanding with concepts such as object permanence or scale/perspective, and 2) an evaluation of low-level physical constants and material properties such as friction coefficients or fluid viscosity. When SOTA video-based world models are evaluated on WorldBench, we find specific patterns of failure in particular physics concepts, with all tested models lacking the physical consistency required to generate reliable real-world interactions. Through its concept-specific evaluation, WorldBench offers a more nuanced and scalable framework for rigorously evaluating the physical reasoning capabilities of video generation and world models, paving the way for more robust and generalizable world-model-driven learning.

  • MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

    arXiv (Cornell University) · 2026-03-20

    preprintOpen accessSenior author

    Monocular 3D object understanding has largely been cast as a 2D RoI-to-3D box lifting problem. However, emerging downstream applications require image-plane geometry (e.g., projected 3D box corners) which cannot be easily obtained without known intrinsics, a problem for object detection in the wild. We introduce MoCA3D, a Monocular, Class-Agnostic 3D model that predicts projected 3D bounding box corners and per-corner depths without requiring camera intrinsics at inference time. MoCA3D formulates pixel-space localization and depth assignment as dense prediction via corner heatmaps and depth maps. To evaluate image-plane geometric fidelity, we propose Pixel-Aligned Geometry (PAG), which directly measures image-plane corner and depth consistency. Extensive experiments demonstrate that MoCA3D achieves state-of-the-art performance, improving image-plane corner PAG by 22.8% while remaining comparable on 3D IoU, using up to 57 times fewer trainable parameters. Finally, we apply MoCA3D to downstream tasks which were previously impractical under unknown intrinsics, highlighting its utility beyond standard baseline models.

  • SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning

    arXiv (Cornell University) · 2026-03-28

    preprintOpen access

    Large vision-language models (VLMs) still struggle with reliable 3D spatial reasoning, a core capability for embodied and physical AI systems. This limitation arises from their inability to capture fine-grained 3D geometry and spatial relationships. While recent efforts have introduced multi-view geometry transformers into VLMs, they typically fuse only the deep-layer features from vision and geometry encoders, discarding rich hierarchical signals and creating a fundamental bottleneck for spatial understanding. To overcome this, we propose SpatialStack, a general hierarchical fusion framework that progressively aligns vision, geometry, and language representations across the model hierarchy. Moving beyond conventional late-stage vision-geometry fusion, SpatialStack stacks and synchronizes multi-level geometric features with the language backbone, enabling the model to capture both local geometric precision and global contextual semantics. Building upon this framework, we develop VLM-SpatialStack, a model that achieves state-of-the-art performance on multiple 3D spatial reasoning benchmarks. Extensive experiments and ablations demonstrate that our multi-level fusion strategy consistently enhances 3D understanding and generalizes robustly across diverse spatial reasoning tasks, establishing SpatialStack as an effective and extensible design paradigm for vision-language-geometry integration in next-generation multimodal physical AI systems.

  • SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning

    ArXiv.org · 2026-03-28

    articleOpen access

    Large vision-language models (VLMs) still struggle with reliable 3D spatial reasoning, a core capability for embodied and physical AI systems. This limitation arises from their inability to capture fine-grained 3D geometry and spatial relationships. While recent efforts have introduced multi-view geometry transformers into VLMs, they typically fuse only the deep-layer features from vision and geometry encoders, discarding rich hierarchical signals and creating a fundamental bottleneck for spatial understanding. To overcome this, we propose SpatialStack, a general hierarchical fusion framework that progressively aligns vision, geometry, and language representations across the model hierarchy. Moving beyond conventional late-stage vision-geometry fusion, SpatialStack stacks and synchronizes multi-level geometric features with the language backbone, enabling the model to capture both local geometric precision and global contextual semantics. Building upon this framework, we develop VLM-SpatialStack, a model that achieves state-of-the-art performance on multiple 3D spatial reasoning benchmarks. Extensive experiments and ablations demonstrate that our multi-level fusion strategy consistently enhances 3D understanding and generalizes robustly across diverse spatial reasoning tasks, establishing SpatialStack as an effective and extensible design paradigm for vision-language-geometry integration in next-generation multimodal physical AI systems.

  • WorldBench: Disambiguating Physics for Diagnostic Evaluation of World Models

    Open MIND · 2026-01-29

    preprintSenior author

    Recent advances in generative foundational models, often termed "world models," have propelled interest in applying them to critical tasks like robotic planning and autonomous system training. For reliable deployment, these models must exhibit high physical fidelity, accurately simulating real-world dynamics. Existing physics-based video benchmarks, however, suffer from entanglement, where a single test simultaneously evaluates multiple physical laws and concepts, fundamentally limiting their diagnostic capability. We introduce WorldBench, a novel video-based benchmark specifically designed for concept-specific, disentangled evaluation, allowing us to rigorously isolate and assess understanding of a single physical concept or law at a time. To make WorldBench comprehensive, we design benchmarks at two different levels: 1) an evaluation of intuitive physical understanding with concepts such as object permanence or scale/perspective, and 2) an evaluation of low-level physical constants and material properties such as friction coefficients or fluid viscosity. When SOTA video-based world models are evaluated on WorldBench, we find specific patterns of failure in particular physics concepts, with all tested models lacking the physical consistency required to generate reliable real-world interactions. Through its concept-specific evaluation, WorldBench offers a more nuanced and scalable framework for rigorously evaluating the physical reasoning capabilities of video generation and world models, paving the way for more robust and generalizable world-model-driven learning.

  • MoCA3D: Monocular 3D Bounding Box Prediction in the Image Plane

    ArXiv.org · 2026-03-20

    articleOpen accessSenior author

    Monocular 3D object understanding has largely been cast as a 2D RoI-to-3D box lifting problem. However, emerging downstream applications require image-plane geometry (e.g., projected 3D box corners) which cannot be easily obtained without known intrinsics, a problem for object detection in the wild. We introduce MoCA3D, a Monocular, Class-Agnostic 3D model that predicts projected 3D bounding box corners and per-corner depths without requiring camera intrinsics at inference time. MoCA3D formulates pixel-space localization and depth assignment as dense prediction via corner heatmaps and depth maps. To evaluate image-plane geometric fidelity, we propose Pixel-Aligned Geometry (PAG), which directly measures image-plane corner and depth consistency. Extensive experiments demonstrate that MoCA3D achieves state-of-the-art performance, improving image-plane corner PAG by 22.8% while remaining comparable on 3D IoU, using up to 57 times fewer trainable parameters. Finally, we apply MoCA3D to downstream tasks which were previously impractical under unknown intrinsics, highlighting its utility beyond standard baseline models.

  • Clinician Perspectives on Ambient AI Scribes in the ICU: A Qualitative Study of Acceptability and Integration in Team-Based Communication and Documentation (Preprint)

    JMIR Medical Informatics · 2025-07-29

    articleOpen accessSenior author
  • Developing new solutions for data provenance and deepfake detection using physics, hardware, and machine learning

    2025-05-29

    articleSenior author

    As generative machine learning and deepfakes become increasingly important, reliable methods for protecting data provenance and authenticity are essential. Current approaches for verifying data provenance often rely on cryptographic measures. While cryptography can ensure the authenticity of data, it cannot guarantee the honesty/correctness of the data itself; for instance, if a sensor is spoofed, the generated data may be false even before the cryptographic process takes place. This paper introduces this new attack surface, the Physical Layer. We show a real example of how such an attack can be conducted. We then explore various solutions to address this concern, including leveraging hardware, sensing, and physics.

  • Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields

    2025-06-10

    articleSenior author

    Recent advancements in 2D and multimodal models have achieved remarkable success by leveraging large-scale training on extensive datasets. However, extending these achievements to enable free-form interactions and high-level semantic operations with complex 3D/4D scenes remains challenging. This difficulty stems from the limited availability of large-scale, annotated 3D/4D or multi-view datasets, which are crucial for generalizable vision and language tasks such as open-vocabulary and prompt-based segmentation, language-guided editing, and visual question answering (VQA). In this paper, we introduce Feature4X, a universal framework designed to extend any functionality from 2D vision foundation model into the 4D realm, using only monocular video input, which is widely available from user-generated content. The "X" in Feature4X represents its versatility, enabling any task through adaptable, model-conditioned 4D feature field distillation. At the core of our framework is a dynamic optimization strategy that unifies multiple model capabilities into a single representation. Additionally, to the best of our knowledge, Feature4X is the first method to distill and lift the features of video foundation models (e.g. SAM2, InternVideo2) into an explicit 4D feature field using Gaussian Splatting. Our experiments showcase novel view segment anything, geometric and appearance scene editing, and free-form VQA across all time steps, empowered by LLMs in feedback loops. These advancements broaden the scope of agentic AI applications by providing a foundation for scalable, contextually and spatiotemporally aware systems capable of immersive dynamic 4D scene interaction.

  • Exploring Ambient Artificial Intelligence to Enhance Learning and Feedback During OR-to-ICU Handoffs: A Co-Design and Simulation Study (Preprint)

    JMIR Medical Education · 2025-10-24

    articleOpen accessSenior author

    <sec> <title>BACKGROUND</title> Operating room–to–intensive care unit (OR-to-ICU) handoffs are among the most complex and high-risk communication events in perioperative care. Despite the implementation of structured checklists, trainees often receive limited feedback on their communication skills, and simulation-based education rarely provides objective data on communication performance and checklist adherence. This study explores how an ambient AI handoff assistant used during simulation-based training of OR-to-ICU handoff discussions can enhance clinical communication training and AI literacy by mapping spoken handoff discussions to handoff checklist items, enabling the development of a handoff note that functioned as a structured, feedback-rich learning artifact. </sec> <sec> <title>OBJECTIVE</title> To co-design and evaluate an ambient AI handoff assistant that captures spoken OR-to-ICU handoff communication, maps it to handoff checklist items, and provides immediate feedback on handoff completeness during simulated OR-to-ICU transitions in an educational setting. </sec> <sec> <title>METHODS</title> A two-phase mixed-methods study was conducted within the UCLA Department of Anesthesiology and Perioperative Care (July–October 2025). Phase 1 comprised co-design interviews with four clinician educators to identify limitations of current handoff training and inform AI feature development. Phase 2 involved an error analysis, as well as evaluations of usability, workload, and educational impact, conducted through ten 60-minute simulation sessions with pairs of medical students and first-year residents. Quantitative measures included Physician Task Load Index (PTL), System Usability Scale (SUS), and a post-simulation survey; qualitative data from co-design sessions and simulation debrief interviews were thematically analyzed. </sec> <sec> <title>RESULTS</title> Educators highlighted inconsistent checklist use and the absence of objective feedback on learners’ communication skills as key areas that could benefit from structured documentation of handoff discussions using AI. Error analysis of the ambient AI handoff assistant revealed a mean of 3.6 errors per note, with incorrect output being the most frequent error type. There was no statistically significant difference between the ambient AI handoff assistant and the paper checklist with respect to PTL and SUS measures. Trainees valued real-time transcripts and structured handoff notes for reflection of communication practices, and exposure to AI documentation errors enhanced critical thinking and awareness of AI technology limitations. </sec> <sec> <title>CONCLUSIONS</title> The ambient AI handoff assistant mapped simulated handoff discussions to checklist items and generated a structured handoff note, facilitating reflection on team-based communication skills in handoff education. Imperfections in the AI’s output encouraged critical appraisal of its capabilities and prompted discussion about automation complacency, suggesting that AI-assisted simulations can foster both communication and digital literacy skills essential for future AI-enabled clinical practice. </sec>

Recent grants

Frequent coauthors

  • Ramesh Raskar

    36 shared
  • Yunhao Ba

    24 shared
  • Pradyumna Chari

    18 shared
  • Vage Taamazyan

    Intrinsic LifeSciences (United States)

    15 shared
  • Howard Zhang

    12 shared
  • Ayush Bhandari

    11 shared
  • Boxin Shi

    Peking University

    9 shared
  • Refael Whyte

    9 shared

Labs

Education

  • Ph.D., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology (MIT)

    2018

Awards & honors

  • NSF CAREER Award (2021)
  • DARPA Young Faculty Award (YFA) (2021)
  • Army Research Office Young Investigator Award (YIP) (2021)
  • National Academy of Engineering (NAE) Frontiers in Engineeri…
  • Senior Member, National Academy of Inventors (2020)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Achuta Kadambi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup