Lerrel Pinto
· Assistant Professor of Computer ScienceNew York University · Mathematics
Active 2012–2026
About
Lerrel Pinto is a roboticist at Meta Superintelligence Labs (MSL), where he leads efforts in building frontier robotics models to achieve personal superintelligence in the physical world. He is also the co-founder and CEO of Assured Robot Intelligence (ARI), which was acquired by Meta in May 2026. Previously, he co-founded Fauna Robotics, acquired by Amazon. In academia, he runs the General Robotics & AI Lab (GRAIL) at NYU, focusing on robot learning, sensory representation learning, action and behavior modeling, reinforcement learning for adaptation, and the development of affordable, open-source robots. His work has been recognized with numerous awards including the Samsung AI researcher of the year award, Sloan Fellowship, Packard Fellowship, CIFAR Fellowship, TR35, IEEE RAS Early Career, and NSF CAREER awards.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Mathematics
- Human–computer interaction
- Programming language
- Computer vision
- Computer graphics (images)
- Algorithm
Selected publications
Hierarchical Latent Action Model
ArXiv.org · 2026-03-06
articleOpen accessLatent Action Models (LAMs) enable learning from actionless data for applications ranging from robotic control to interactive world models. However, existing LAMs typically focus on short-horizon frame transitions and capture low-level motion while overlooking longer-term temporal structure. In contrast, actionless videos often contain temporally extended and high-level skills. We present HiLAM, a hierarchical latent action model that discovers latent skills by modeling long-term temporal information. To capture these dependencies across long horizons, we utilize a pretrained LAM as a low-level extractor. This architecture aggregates latent action sequences, which contain the underlying dynamic patterns of the video, into high-level latent skills. Our experiments demonstrate that HiLAM improves over the baseline and exhibits robust dynamic skill discovery.
Hierarchical Latent Action Model
Open MIND · 2026-03-06
preprintLatent Action Models (LAMs) enable learning from actionless data for applications ranging from robotic control to interactive world models. However, existing LAMs typically focus on short-horizon frame transitions and capture low-level motion while overlooking longer-term temporal structure. In contrast, actionless videos often contain temporally extended and high-level skills. We present HiLAM, a hierarchical latent action model that discovers latent skills by modeling long-term temporal information. To capture these dependencies across long horizons, we utilize a pretrained LAM as a low-level extractor. This architecture aggregates latent action sequences, which contain the underlying dynamic patterns of the video, into high-level latent skills. Our experiments demonstrate that HiLAM improves over the baseline and exhibits robust dynamic skill discovery.
Bridging the Human to Robot Dexterity Gap Through Object-Oriented Rewards
2025-05-19 · 3 citations
articleSenior authorTraining robots directly from human videos is an emerging area in robotics and computer vision. While there has been notable progress with two-fingered grippers, learning autonomous tasks without teleoperation remains a difficult problem for multi-fingered robot hands. A key reason for this difficulty is that a policy trained on human hands may not directly transfer to a robot hand with a different morphology. In this work, we present HUDOR, a technique that enables online fine-tuning of the policy by constructing a reward function from the human video. Importantly, this reward function is built using object-oriented rewards derived from off-the-shelf point trackers, which allows for meaningful learning signals even when the robot hand is in the visual observation, while the human hand is used to construct the reward. Given a single video of human solving a task, such as gently opening a music box, HUDOR allows our four-fingered Allegro hand to learn this task with just an hour of online interaction. Our experiments across four tasks, show that HUDOR outperforms alternatives with an average of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$4 \times$</tex> improvement. Code and videos are available on our website https://object-rewards.github.io/.
Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations
ArXiv.org · 2025-11-20
preprintOpen accessLearning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community. Achieving this would mark significant progress toward generalizable robot manipulation in human environments, as it would reduce the reliance on labor-intensive robot data collection. Despite substantial efforts, progress toward this goal has been bottle-necked by the embodiment gap between humans and robots, as well as by difficulties in extracting relevant contextual and motion cues that enable learning of autonomous policies from in-the-wild human videos. We claim that with simple yet sufficiently powerful hardware for obtaining human data and our proposed framework AINA, we are now one significant step closer to achieving this dream. AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any environment using Aria Gen 2 glasses. These glasses are lightweight and portable, feature a high-resolution RGB camera, provide accurate on-board 3D head and hand poses, and offer a wide stereo view that can be leveraged for depth estimation of the scene. This setup enables the learning of 3D point-based policies for multi-fingered hands that are robust to background changes and can be deployed directly without requiring any robot data (including online corrections, reinforcement learning, or simulation). We compare our framework against prior human-to-robot policy learning approaches, ablate our design choices, and demonstrate results across nine everyday manipulation tasks. Robot rollouts are best viewed on our website: https://aina-robot.github.io.
AnySkin: Plug-and-Play Skin Sensing for Robotic Touch
2025-05-19 · 3 citations
articleSenior authorWhile tactile sensing is widely accepted as an important and useful sensing modality, its use pales in comparison to other sensory modalities like vision and proprioception. AnySkin addresses the critical challenges that impede the use of tactile sensing - versatility, replaceability, and data reusability. Building on the simplistic design of ReSkin, and decoupling the sensing electronics from the sensing interface, AnySkin simplifies integration making it as straightforward as putting on a phone case and connecting a charger. Furthermore, AnySkin is the first uncalibrated tactile-sensor to report crossinstance generalizability of learned manipulation policies. To summarize, this work makes three key contributions: first, we introduce a streamlined fabrication process and a design tool for creating an adhesive-free, durable and easily replaceable magnetic tactile sensor; second, we characterize slip detection and policy learning with the AnySkin sensor; third, we demonstrate zero-shot generalization of models trained on one instance of AnySkin to new instances, and compare it with popular existing tactile solutions like DIGIT and ReSkin. Code, design files, and videos of policy experiments can be found on https://any-skin.github.io
RUKA: Rethinking the Design of Humanoid Hands with Learning
2025-06-21 · 3 citations
articleOpen accessSenior authorDexterous manipulation is a fundamental capability for robotic systems, yet progress has been limited by hardware trade-offs between precision, compactness, strength, and affordability.Existing control methods impose compromises on hand designs and applications.However, learning-based approaches present opportunities to rethink these trade-offs, particularly to address challenges with tendon-driven actuation and low-cost materials.This work presents RUKA, a tendon-driven humanoid hand that is compact, affordable, and capable.Made from 3Dprinted parts and off-the-shelf components, RUKA has 5 fingers with 15 underactuated degrees of freedom enabling diverse human-like grasps.Its tendon-driven actuation allows powerful grasping in a compact, human-sized form factor.To address control challenges, we learn joint-to-actuator and fingertip-toactuator models from motion-capture data collected by the MANUS glove, leveraging the hand's morphological accuracy.Extensive evaluations demonstrate RUKA's superior reachability, durability, and strength compared to other robotic hands.Teleoperation tasks further showcase RUKA's dexterous movements.The open-source design and assembly instructions of RUKA, code, and data are available at ruka-hand.github.io
Dynamem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
2025-05-19 · 2 citations
articleSenior authorSignificant progress has been made in openvocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in realworld scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70 % on non-stationary objects, which is more than a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{2} \times$</tex> improvement over state-of-the-art static systems.
Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation
ArXiv.org · 2025-02-27
preprintOpen accessSenior authorBuilding robotic agents capable of operating across diverse environments and object types remains a significant challenge, often requiring extensive data collection. This is particularly restrictive in robotics, where each data point must be physically executed in the real world. Consequently, there is a critical need for alternative data sources for robotics and frameworks that enable learning from such data. In this work, we present Point Policy, a new method for learning robot policies exclusively from offline human demonstration videos and without any teleoperation data. Point Policy leverages state-of-the-art vision models and policy architectures to translate human hand poses into robot poses while capturing object states through semantically meaningful key points. This approach yields a morphology-agnostic representation that facilitates effective policy learning. Our experiments on 8 real-world tasks demonstrate an overall 75% absolute improvement over prior works when evaluated in identical settings as training. Further, Point Policy exhibits a 74% gain across tasks for novel object instances and is robust to significant background clutter. Videos of the robot are best viewed at https://point-policy.github.io/.
eFlesh: Highly customizable Magnetic Touch Sensing using Cut-Cell Microstructures
ArXiv.org · 2025-06-11 · 1 citations
preprintOpen accessIf human experience is any guide, operating effectively in unstructured environments -- like homes and offices -- requires robots to sense the forces during physical interaction. Yet, the lack of a versatile, accessible, and easily customizable tactile sensor has led to fragmented, sensor-specific solutions in robotic manipulation -- and in many cases, to force-unaware, sensorless approaches. With eFlesh, we bridge this gap by introducing a magnetic tactile sensor that is low-cost, easy to fabricate, and highly customizable. Building an eFlesh sensor requires only four components: a hobbyist 3D printer, off-the-shelf magnets (<$5), a CAD model of the desired shape, and a magnetometer circuit board. The sensor is constructed from tiled, parameterized microstructures, which allow for tuning the sensor's geometry and its mechanical response. We provide an open-source design tool that converts convex OBJ/STL files into 3D-printable STLs for fabrication. This modular design framework enables users to create application-specific sensors, and to adjust sensitivity depending on the task. Our sensor characterization experiments demonstrate the capabilities of eFlesh: contact localization RMSE of 0.5 mm, and force prediction RMSE of 0.27 N for normal force and 0.12 N for shear force. We also present a learned slip detection model that generalizes to unseen objects with 95% accuracy, and visuotactile control policies that improve manipulation performance by 40% over vision-only baselines -- achieving 91% average success rate for four precise tasks that require sub-mm accuracy for successful completion. All design files, code and the CAD-to-eFlesh STL conversion tool are open-sourced and available on https://e-flesh.com.
EgoZero: Robot Learning from Smart Glasses
ArXiv.org · 2025-05-26
preprintOpen accessSenior authorDespite recent progress in general purpose robotics, robot policies still lag far behind basic human capabilities in the real world. Humans interact constantly with the physical world, yet this rich data resource remains largely untapped in robot learning. We propose EgoZero, a minimal system that learns robust manipulation policies from human demonstrations captured with Project Aria smart glasses, $\textbf{and zero robot data}$. EgoZero enables: (1) extraction of complete, robot-executable actions from in-the-wild, egocentric, human demonstrations, (2) compression of human visual observations into morphology-agnostic state representations, and (3) closed-loop policy learning that generalizes morphologically, spatially, and semantically. We deploy EgoZero policies on a gripper Franka Panda robot and demonstrate zero-shot transfer with 70% success rate over 7 manipulation tasks and only 20 minutes of data collection per task. Our results suggest that in-the-wild human data can serve as a scalable foundation for real-world robot learning - paving the way toward a future of abundant, diverse, and naturalistic training data for robots. Code and videos are available at https://egozero-robot.github.io.
Frequent coauthors
- 43 shared
Pieter Abbeel
University of California, Berkeley
- 21 shared
Abhinav Gupta
- 21 shared
Abhinav Gupta
- 18 shared
Dhiraj Gandhi
Meta (Israel)
- 16 shared
Nur Muhammad Mahi Shafiullah
- 11 shared
Soumith Chintala
- 11 shared
Siddhant Haldar
- 9 shared
Jyothish Pari
Education
Ph.D., Robotics
Unknown
M.S., Robotics
Unknown
B.S., Robotics
Unknown
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Lerrel Pinto
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup