Louis Goldstein
· Professor of LinguisticsUniversity of Southern California · Linguistics
Active 1933–2026
Research topics
- Computer Science
- Artificial Intelligence
- Speech recognition
- Psychology
- Audiology
- Medicine
- Pathology
- Biology
- Linguistics
- Acoustics
Selected publications
A Long-Form Single-Speaker Real-Time MRI Speech Dataset and Benchmark
2026-04-21
articleOpen accessWe release the USC Long Single-Speaker (LSS) dataset containing real-time MRI video of the vocal tract dynamics and simultaneous audio obtained during speech production. This unique dataset contains roughly one hour of video and audio data from a single native speaker of American English, making it one of the longer publicly available single-speaker datasets of real-time MRI speech data. Along with the articulatory and acoustic raw data, we release derived representations of the data that are suitable for a range of downstream tasks. This includes video cropped to the vocal tract region, sentence-level splits of the data, restored and denoised audio, and regions-of-interest timeseries. We also benchmark this dataset on articulatory synthesis and phoneme recognition tasks, providing baseline performance for these tasks on this dataset which future research can aim to improve upon. Dataset website: https://sail.usc.edu/span/single_spk
Arti-6: Towards six-dimensional Articulatory Speech Encoding
2026-04-21
articleWe propose ARTI-6, a compact six-dimensional articulatory speech encoding framework derived from real-time MRI data that captures crucial vocal tract regions including the velum, tongue root, and larynx. ARTI-6 consists of three components: (1) a six-dimensional articulatory feature set representing key regions of the vocal tract; (2) an articulatory inversion model, which predicts articulatory features from speech acoustics leveraging speech foundation models, achieving a prediction correlation of 0.87; and (3) an articulatory synthesis model, which reconstructs intelligible speech directly from articulatory features, showing that even a low-dimensional representation can generate natural-sounding speech. Together, ARTI-6 provides an interpretable, computationally efficient, and physiologically grounded framework for advancing articulatory inversion, synthesis, and broader speech technology applications. The source code and speech samples are publicly available. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
HAL (Le Centre pour la Communication Scientifique Directe) · 2025-01-01
datasetOpen accessThis unpublished document can be cited as: Fougeron C., Goldstein L., Guenther F., Lœvenbruck H., Mefferd A., Mücke D., Niziolek C., Parrel B., Perrier P., Ziegler W., Laganaro M. (unpublished manuscript) Transcription of the workshop Speech Production Models and Empirical Evidence from Typical and Pathological Speech. 13 May 2024, Grenoble, France. doi: 10.26037/yareta:cvlb5qujzzc3ti3pgxq62y76ui.
The stability of articulatory and acoustic oscillatory signals derived from speech
JASA Express Letters · 2025-04-01
articleOpen accessSenior authorArticulatory underpinnings of periodicities in the speech signal are unclear beyond a general alternation of vocal tract opening and closing. This study evaluates a modulatory articulatory signal that captures instantaneous change in vocal tract posture and its relation with two acoustic oscillatory signals, comparing stabilities to the progression of vowel and stressed vowel onsets. Modulatory signals can be calculated more efficiently than labeling linguistic events. These signals were more stable in periodicity than acoustic vowel onsets and not different from stressed vowel onsets, suggesting that an articulatory modulation function can provide a useful method for indexing foundational periodicities in speech without tedious annotation.
2025-08-17
articleOpen access<p>High-quality speech articulatory databases are essential for advancing speech science and technology research. However, the lack of standardized annotations limits their full potential use and broad accessibility. In this context, we introduce 75-Speaker Annot-16, a comprehensive annotation dataset derived from the 75-Speaker vocal tract MRI database. Annot-16 provides phonetic alignments, articulator contour annotations, and handmade ground-truth articulator contours. Our annotation process integrates automated algorithms with expert verification to ensure accuracy and efficiency. To demonstrate its utility, we establish three benchmark tasks: speech phoneme recognition, articulatory contour segmentation, and articulatory phoneme recognition. Annot-16 can serve as a valuable resource for speech modeling, computer vision, and cross-modal learning, bridging engineering applications, speech science, and linguistic research.</p>
Co-registration of real-time MRI and respiration for speech research
2025-08-17 · 1 citations
articleSenior authorArXiv.org · 2025-05-29
preprintOpen accessAlthough many previous studies have carried out multimodal learning with real-time MRI data that captures the audio-visual kinematics of the vocal tract during speech, these studies have been limited by their reliance on multi-speaker corpora. This prevents such models from learning a detailed relationship between acoustics and articulation due to considerable cross-speaker variability. In this study, we develop unimodal audio and video models as well as multimodal models for phoneme recognition using a long-form single-speaker MRI corpus, with the goal of disentangling and interpreting the contributions of each modality. Audio and multimodal models show similar performance on different phonetic manner classes but diverge on places of articulation. Interpretation of the models' latent space shows similar encoding of the phonetic space across audio and multimodal models, while the models' attention weights highlight differences in acoustic and articulatory timing for certain phonemes.
Articulatory Feature Prediction from Surface EMG during Speech Production
2025-08-17 · 3 citations
articleArticulatory Feature Prediction from Surface EMG during Speech Production
ArXiv.org · 2025-05-20
preprintOpen accessWe present a model for predicting articulatory features from surface electromyography (EMG) signals during speech production. The proposed model integrates convolutional layers and a Transformer block, followed by separate predictors for articulatory features. Our approach achieves a high prediction correlation of approximately 0.9 for most articulatory features. Furthermore, we demonstrate that these predicted articulatory features can be decoded into intelligible speech waveforms. To our knowledge, this is the first method to decode speech waveforms from surface EMG via articulatory features, offering a novel approach to EMG-based speech synthesis. Additionally, we analyze the relationship between EMG electrode placement and articulatory feature predictability, providing knowledge-driven insights for optimizing EMG electrode configurations. The source code and decoded speech samples are publicly available.
2025-08-17
articleSenior author
Recent grants
NIH · $3.3M · 2013
Collaborative Research: Prosodic Structure: An Integrated Empirical and Modeling Investigation
NSF · $111k · 2016–2022
NSF · $446k · 2007–2011
Frequent coauthors
- 109 shared
Shrikanth Narayanan
- 77 shared
Dani Byrd
- 75 shared
Elliot Saltzman
Boston University
- 57 shared
Hosung Nam
- 43 shared
Michael Proctor
- 42 shared
Marianne Pouplier
Klinikum Saarbrücken
- 40 shared
Vikram Ramanarayanan
- 39 shared
Christine Mooshammer
Humboldt-Universität zu Berlin
Labs
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Louis Goldstein
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup