Louis Goldstein

· Professor of Linguistics

University of Southern California · Linguistics

Active 1933–2026

h-index45

Citations9.4k

Papers41650 last 5y

Funding$3.9M

Faculty page Lab page Website

See your match with Louis Goldstein — sign in to PhdFit.Sign in

Research topics

Computer Science
Artificial Intelligence
Speech recognition
Psychology
Audiology
Medicine
Pathology
Biology
Linguistics
Acoustics

Selected publications

A Long-Form Single-Speaker Real-Time MRI Speech Dataset and Benchmark
2026-04-21
articleOpen access
We release the USC Long Single-Speaker (LSS) dataset containing real-time MRI video of the vocal tract dynamics and simultaneous audio obtained during speech production. This unique dataset contains roughly one hour of video and audio data from a single native speaker of American English, making it one of the longer publicly available single-speaker datasets of real-time MRI speech data. Along with the articulatory and acoustic raw data, we release derived representations of the data that are suitable for a range of downstream tasks. This includes video cropped to the vocal tract region, sentence-level splits of the data, restored and denoised audio, and regions-of-interest timeseries. We also benchmark this dataset on articulatory synthesis and phoneme recognition tasks, providing baseline performance for these tasks on this dataset which future research can aim to improve upon. Dataset website: https://sail.usc.edu/span/single_spk
Publisher OA PDF DOI
Arti-6: Towards six-dimensional Articulatory Speech Encoding
2026-04-21
article
We propose ARTI-6, a compact six-dimensional articulatory speech encoding framework derived from real-time MRI data that captures crucial vocal tract regions including the velum, tongue root, and larynx. ARTI-6 consists of three components: (1) a six-dimensional articulatory feature set representing key regions of the vocal tract; (2) an articulatory inversion model, which predicts articulatory features from speech acoustics leveraging speech foundation models, achieving a prediction correlation of 0.87; and (3) an articulatory synthesis model, which reconstructs intelligible speech directly from articulatory features, showing that even a low-dimensional representation can generate natural-sounding speech. Together, ARTI-6 provides an interpretable, computationally efficient, and physiologically grounded framework for advancing articulatory inversion, synthesis, and broader speech technology applications. The source code and speech samples are publicly available. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
Publisher DOI
Workshop 13 May 2024 : SPEECH PRODUCTION MODELS AND EMPIRICAL EVIDENCE FROM TYPICAL AND PATHOLOGICAL SPEECH
HAL (Le Centre pour la Communication Scientifique Directe) · 2025-01-01
datasetOpen access
This unpublished document can be cited as: Fougeron C., Goldstein L., Guenther F., Lœvenbruck H., Mefferd A., Mücke D., Niziolek C., Parrel B., Perrier P., Ziegler W., Laganaro M. (unpublished manuscript) Transcription of the workshop Speech Production Models and Empirical Evidence from Typical and Pathological Speech. 13 May 2024, Grenoble, France. doi: 10.26037/yareta:cvlb5qujzzc3ti3pgxq62y76ui.
Publisher OA PDF DOI
The stability of articulatory and acoustic oscillatory signals derived from speech
JASA Express Letters · 2025-04-01
articleOpen accessSenior author
Articulatory underpinnings of periodicities in the speech signal are unclear beyond a general alternation of vocal tract opening and closing. This study evaluates a modulatory articulatory signal that captures instantaneous change in vocal tract posture and its relation with two acoustic oscillatory signals, comparing stabilities to the progression of vowel and stressed vowel onsets. Modulatory signals can be calculated more efficiently than labeling linguistic events. These signals were more stable in periodicity than acoustic vowel onsets and not different from stressed vowel onsets, suggesting that an articulatory modulation function can provide a useful method for indexing foundational periodicities in speech without tedious annotation.
Publisher DOI
75-Speaker Annot-16: A benchmark dataset for speech articulatory rt-MRI annotation with articulator contours and phonetic alignment
2025-08-17
articleOpen access
<p>High-quality speech articulatory databases are essential for advancing speech science and technology research. However, the lack of standardized annotations limits their full potential use and broad accessibility. In this context, we introduce 75-Speaker Annot-16, a comprehensive annotation dataset derived from the 75-Speaker vocal tract MRI database. Annot-16 provides phonetic alignments, articulator contour annotations, and handmade ground-truth articulator contours. Our annotation process integrates automated algorithms with expert verification to ensure accuracy and efficiency. To demonstrate its utility, we establish three benchmark tasks: speech phoneme recognition, articulatory contour segmentation, and articulatory phoneme recognition. Annot-16 can serve as a valuable resource for speech modeling, computer vision, and cross-modal learning, bridging engineering applications, speech science, and linguistic research.</p>
Publisher OA PDF DOI
Co-registration of real-time MRI and respiration for speech research
2025-08-17 · 1 citations
articleSenior author
Publisher DOI
Towards disentangling the contributions of articulation and acoustics in multimodal phoneme recognition
ArXiv.org · 2025-05-29
preprintOpen access
Although many previous studies have carried out multimodal learning with real-time MRI data that captures the audio-visual kinematics of the vocal tract during speech, these studies have been limited by their reliance on multi-speaker corpora. This prevents such models from learning a detailed relationship between acoustics and articulation due to considerable cross-speaker variability. In this study, we develop unimodal audio and video models as well as multimodal models for phoneme recognition using a long-form single-speaker MRI corpus, with the goal of disentangling and interpreting the contributions of each modality. Audio and multimodal models show similar performance on different phonetic manner classes but diverge on places of articulation. Interpretation of the models' latent space shows similar encoding of the phonetic space across audio and multimodal models, while the models' attention weights highlight differences in acoustic and articulatory timing for certain phonemes.
Publisher OA PDF DOI
Articulatory Feature Prediction from Surface EMG during Speech Production
2025-08-17 · 3 citations
article
Publisher DOI
Articulatory Feature Prediction from Surface EMG during Speech Production
ArXiv.org · 2025-05-20
preprintOpen access
We present a model for predicting articulatory features from surface electromyography (EMG) signals during speech production. The proposed model integrates convolutional layers and a Transformer block, followed by separate predictors for articulatory features. Our approach achieves a high prediction correlation of approximately 0.9 for most articulatory features. Furthermore, we demonstrate that these predicted articulatory features can be decoded into intelligible speech waveforms. To our knowledge, this is the first method to decode speech waveforms from surface EMG via articulatory features, offering a novel approach to EMG-based speech synthesis. Additionally, we analyze the relationship between EMG electrode placement and articulatory feature predictability, providing knowledge-driven insights for optimizing EMG electrode configurations. The source code and decoded speech samples are publicly available.
Publisher OA PDF DOI
Instantaneous changes in acoustic signals reflect syllable progression and cross-linguistic syllable variation
2025-08-17
articleSenior author
Publisher DOI

Recent grants

NIH Grant R01DC008780
NIH · $3.3M · 2013
Collaborative Research: Prosodic Structure: An Integrated Empirical and Modeling Investigation
NSF · $111k · 2016–2022
Collaborative Research: Landmark-based Robust Speech Recognition Using Prosody-guided models of speech variability
NSF · $446k · 2007–2011

Frequent coauthors

Shrikanth Narayanan
109 shared
Dani Byrd
77 shared
Elliot Saltzman
Boston University
75 shared
Hosung Nam
57 shared
Michael Proctor
43 shared
Marianne Pouplier
Klinikum Saarbrücken
42 shared
Vikram Ramanarayanan
40 shared
Christine Mooshammer
Humboldt-Universität zu Berlin
39 shared

Labs

Louis Goldstein LabPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Louis Goldstein

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you