Ann Lee
· ProfessorVerifiedCarnegie Mellon University · Machine Learning Department
Active 1982–2026
About
Ann B Lee is a professor in the Department of Statistics & Data Science at Carnegie Mellon University, with a joint appointment in the Machine Learning Department. She is the Co-Director of the STAMPS@CMU Research Center, which she co-founded. Her research interests focus on developing statistical methodology for complex data and problems in the physical sciences, with particular emphasis on trustworthy scientific inference and uncertainty quantification using generative models. She aims to bridge classical statistics and machine learning for simulation-based inference and experimental design, with recent work involving likelihood-free inference, calibrated predictive distributions, and applications in astronomy, probabilistic severe weather forecasting, and rare event detection involving satellite imagery and large-scale studies. Prior to her current position, she was the J.W. Gibbs Assistant Professor at Yale University and served as a visiting research associate at Brown University. She holds a PhD in Physics from Brown University and an MSc/BSc in Engineering Physics from Chalmers University of Technology in Sweden. Ann Lee is also involved in editorial leadership for the new journal PRX Intelligence and is active in organizing workshops and conferences related to her research areas.
Research topics
- Artificial Intelligence
- Computer Science
- Natural Language Processing
- Speech recognition
- Machine Learning
- Computer Security
- Engineering
- Epistemology
- Geography
- Philosophy
- Linguistics
- Cartography
- Programming language
Selected publications
SSRN Electronic Journal · 2026-01-01
preprintOpen access1st authorCorrespondingAuthor Correction: Joint speech and text machine translation for up to 100 languages
Nature · 2025-02-03
erratumOpen accessToward the end-to-end optimization of the SWGO array layout
Nuclear Physics B · 2025-05-02 · 1 citations
articleOpen accessIn this document we consider the problem of finding the optimal layout for the array of water Cherenkov detectors proposed by the SWGO collaboration to study very-high-energy gamma rays in the southern hemisphere. We develop a continuous model of the secondary particles produced by atmospheric showers initiated by high-energy gamma rays and protons, and build an optimization pipeline capable of identifying the most promising configuration of the detector elements. The pipeline employs stochastic gradient descent to maximize a utility function aligned with the scientific goals of the experiment. We demonstrate how the software is capable of finding the global maximum in the high-dimensional parameter space, and discuss its performance and limitations.
Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning
arXiv (Cornell University) · 2025-12-22
preprintOpen accessWe introduce Perception Encoder Audiovisual, PE-AV, a new family of encoders for audio and video understanding trained with scaled contrastive learning. Built on PE, PE-AV makes several key contributions to extend representations to audio, and natively support joint embeddings across audio-video, audio-text, and video-text modalities. PE-AV's unified cross-modal embeddings enable novel tasks such as speech retrieval, and set a new state of the art across standard audio and video benchmarks. We unlock this by building a strong audiovisual data engine that synthesizes high-quality captions for O(100M) audio-video pairs, enabling large-scale supervision consistent across modalities. Our audio data includes speech, music, and general sound effects-avoiding single-domain limitations common in prior work. We exploit ten pairwise contrastive objectives, showing that scaling cross-modality and caption-type pairs strengthens alignment and improves zero-shot performance. We further develop PE-A-Frame by fine-tuning PE-AV with frame-level contrastive objectives, enabling fine-grained audio-frame-to-text alignment for tasks such as sound event detection.
Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning
ArXiv.org · 2025-12-22
articleOpen accessWe introduce Perception Encoder Audiovisual, PE-AV, a new family of encoders for audio and video understanding trained with scaled contrastive learning. Built on PE, PE-AV makes several key contributions to extend representations to audio, and natively support joint embeddings across audio-video, audio-text, and video-text modalities. PE-AV's unified cross-modal embeddings enable novel tasks such as speech retrieval, and set a new state of the art across standard audio and video benchmarks. We unlock this by building a strong audiovisual data engine that synthesizes high-quality captions for O(100M) audio-video pairs, enabling large-scale supervision consistent across modalities. Our audio data includes speech, music, and general sound effects-avoiding single-domain limitations common in prior work. We exploit ten pairwise contrastive objectives, showing that scaling cross-modality and caption-type pairs strengthens alignment and improves zero-shot performance. We further develop PE-A-Frame by fine-tuning PE-AV with frame-level contrastive objectives, enabling fine-grained audio-frame-to-text alignment for tasks such as sound event detection.
Journal of Drug Delivery Science and Technology · 2025-01-05 · 1 citations
articleOpen accessUnderstanding the complex mechanisms underpinning the transport and deposition of aerosols in the human airways is important for improving the effectiveness of oral inhaled drug delivery. This study investigates the impact of the saber-sheath trachea index on particle deposition in the lung using computational fluid dynamics (CFD). Magnetic resonance imaging (MRI) was performed on a healthy volunteer using a mock-up inhaler. Five three-dimensional models were reconstructed from MRI with different saber-sheath trachea indices, ranging from 0.4 to 0.6, representing different severity of the saber-sheath features. CFD simulations coupled with discrete phase modelling with monodisperse particles were performed using two different transient flow profiles with peak inspiratory flow rates of 60 L/min and 120 L/min using a zero-resistance inhaler. Model predictions were validated against measurement data obtained from Particle Image Velocimetry. The results demonstrated that increased saber-sheath trachea severity amplifies differences in particle distribution between the right and left lungs, with a saber-sheath index of 0.5 being the most restrictive in terms of particles reaching the subject-specific lower airways. • Five 3D tracheal models were built from MRI data with saber-sheath trachea indices ranging from 0.4 to 0.6. • CFD results were validated against Particle Image Velocimetry (PIV) data for model accuracy. • A consistent trend of unequal particle deposition between the left and right lungs was revealed. • Saber-sheath index of 0.5 significantly restricted drug particle deposition, especially in the right lung.
Joint speech and text machine translation for up to 100 languages
Nature · 2025-01-15 · 22 citations
articleOpen accessCreating the Babel Fish, a tool that helps individuals translate speech between any two languages, requires advanced technological innovation and linguistic expertise. Although conventional speech-to-speech translation systems composed of multiple subsystems performing translation in a cascaded fashion exist1–3, scalable and high-performing unified systems4,5 remain underexplored. To address this gap, here we introduce SEAMLESSM4T–Massively Multilingual and Multimodal Machine Translation–a single model that supports speech-to-speech translation (101 to 36 languages), speech-to-text translation (from 101 to 96 languages), text-to-speech translation (from 96 to 36 languages), text-to-text translation (96 languages) and automatic speech recognition (96 languages). Built using a new multimodal corpus of automatically aligned speech translations and other publicly available data, SEAMLESSM4T is one of the first multilingual systems that can translate from and into English for both speech and text. Moreover, it outperforms the existing state-of-the-art cascaded systems, achieving up to 8% and 23% higher BLEU (Bilingual Evaluation Understudy) scores in speech-to-text and speech-to-speech tasks, respectively. Beyond quality, when tested for robustness, our system is, on average, approximately 50% more resilient against background noise and speaker variations in speech-to-text tasks than the previous state-of-the-art systems. We evaluated SEAMLESSM4T on added toxicity and gender bias to assess translation safety. For the former, we included two strategies for added toxicity mitigation working at either training or inference time. Finally, all contributions in this work are publicly available for non-commercial use to propel further research on inclusive speech translation technologies. SEAMLESSM4T is a single machine translation tool that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation and automatic speech recognition between up to 100 languages.
arXiv (Cornell University) · 2024-06-04
preprintOpen accessSenior authorIn this paper, we propose a textless acoustic model with a self-supervised distillation strategy for noise-robust expressive speech-to-speech translation (S2ST). Recently proposed expressive S2ST systems have achieved impressive expressivity preservation performances by cascading unit-to-speech (U2S) generator to the speech-to-unit translation model. However, these systems are vulnerable to the presence of noise in input speech, which is an assumption in real-world translation scenarios. To address this limitation, we propose a U2S generator that incorporates a distillation with no label (DINO) self-supervised training strategy into it's pretraining process. Because the proposed method captures noise-agnostic expressivity representation, it can generate qualified speech even in noisy environment. Objective and subjective evaluation results verified that the proposed method significantly improved the performance of the expressive S2ST system in noisy environments while maintaining competitive performance in clean environments.
International Journal of Pharmaceutics · 2024-12-19 · 7 citations
articleOpen accessThe nasal airway comprises a complex network of passages and chambers and plays an important role in regulating the respiratory system's functions. The nasal vestibule is the first chamber of the nasal airway. While variations in nasal vestibule geometries are known to exist between humans, details of their implications on how they may affect the efficacy of nasal drug delivery devices are less clear. In this study, an investigation into how geometrical variations in nasal vestibule could affect particle deposition was conducted to elucidate the role of the vestibule in respiratory physiology. MRI was used to image the nasal airway of 11 subjects. The vestibules in the subjects were reconstructed using 3D slicer, and integrated with a common nasal turbinate to isolate the complexities in flow behavior when subject-specific turbinates were used. This approach minimises the impact of anatomical variations downstream of the vestibule, allowing for a focused evaluation of the vestibule's specific role in airflow dynamics and particle deposition. Particle deposition was examined using a steady flow rate of 15 L/min. Results from this study show that airflow velocity is highest in the middle region of the nasal airway's cross-section, while the olfactory and turbinate regions experience relatively lower airflow. A significant relationship (P < 0.05) between the nostril area, vestibule surface-to-volume ratio and particle deposition was also determined for small particle sizes (10-15 μm), demonstrating the feasibility of tailoring nasal drug delivery efficacies in individuals by cross-examining their nostril area and vestibule surface-to-volume ratio.
Applied Thermal Engineering · 2024-08-03 · 12 citations
articleOpen accessThe utilisation of advanced pin fin designs in microchannels is useful for enhancing cooling efficiency. Advancements in machine learning and processing power have sparked interest in shape optimisation techniques. This research employs a novel framework that integrates Deep Artificial Neural Networks and Reinforcement Learning with a Computational Fluid Dynamics (CFD) solver to optimise multiple pin fin shapes within a microchannel. By incorporating Radial Basis Function interpolation and Proximal Policy Optimisation alongside FLUENT, acting as the CFD environment, the reinforcement learning agent adeptly explores the design space to enhance the thermohydraulic performance factor (TPF), aiming to maximise Nusselt number while minimising pressure loss. Unlike previous heat transfer optimisation studies, which typically required mesh regeneration at each step, the proposed framework could bypass the meshing step and alter the geometry directly by relying on the RBF interpolation technique to deform the mesh directly. Three distinct scenarios investigated in this study are uniform deformation of all pin fins, deformation of all pin fins arranged in two rows, and individual deformation of each pin fin. Extensive simulations, exceeding 90,000 different cases, demonstrate that although the optimisation process for the individual deformation of each pin fin requires more iterations compared to others, it surpasses them in terms of TPF improvement. Notably, significant improvements are achieved, such as a 49 % enhancement in Nusselt number and a 33 % reduction in pressure drop, culminating in an impressive 63 % increase in TPF compared to the initial geometry.
Frequent coauthors
- 115 shared
Teri Hutchison
- 115 shared
Sean O'donnell
Eli Lilly (United States)
- 115 shared
Julie Burton
Perspectives Charter School
- 115 shared
Fred Furtner
International Rescue Committee
- 115 shared
Michael Mcgraw
Perspectives Charter School
- 115 shared
Sandra Friedman
University of Colorado Anschutz Medical Campus
- 115 shared
Maria Duda
- 115 shared
Frederick P. Rivara
Seattle Children's Hospital
Labs
Ann B Lee LabPI
Education
Ph.D.
Brown University
Other, Engineering Physics
Chalmers University of Technology, Sweden
Awards & honors
- STAMPS@CMU becoming a CMU Research Center in Fall 2024
- Finalist at the ASA SPES and Q&P Student Paper Competition f…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ann Lee
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup