Erika Shield Levy
· Professor of Communication Sciences and DisordersVerifiedColumbia University · Curriculum & Teaching
Active 1940–2026
About
Erika S. Levy, Ph.D., CCC-SLP, is the Director of the Speech Production and Perception Lab and a Professor of Communication Sciences and Disorders at Teachers College, Columbia University. She also serves as an Associate Editor of the Journal of Phonetics. Raised in Prague and Vienna, Dr. Levy has a diverse academic background, holding a B.A. in Psychology from Wesleyan University, an M.A. in Linguistics from New York University, an M.A. in Speech and Language Pathology from Lehman College, and a Ph.D. in Speech and Hearing Sciences from the Graduate School and University Center, City University of New York. Her scholarly work has focused on treatment efficacy for the motor speech disorder of dysarthria, cross-language speech perception, and the bilingual lexicon. Currently, her research investigates treatment efficacy aimed at improving intelligibility in children with dysarthria due to cerebral palsy and in adults who speak Spanish, Mandarin, and American English with dysarthria resulting from Parkinson's Disease. Additionally, her laboratory studies the perception and production of American English vowels by native Spanish speakers. Dr. Levy is a trilingual speech-language pathologist and has experience working as a pronunciation coach for the characters Big Bird and Elmo on Sesame Street.
Research topics
- Audiology
- Medicine
- Psychology
- Physical therapy
- Surgery
- Linguistics
Selected publications
Automatic speech recognition for childhood dysarthria (Choi et al., 2026)
Open MIND · 2026-02-26
datasetSenior author<b>Purpose: </b>Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners’ orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners’ judgments.<b>Method: </b>Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners’ transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings.<b>Results: </b>The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems’ WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings.<b>Conclusions: </b>ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications.<b>Supplemental Material S1.</b> Raw and multiple-comparison–adjusted <i>p</i>-values (Holm–Bonferroni) for 10 Spearman correlations between ASR word recognition rates (WRR, %) and human perceptual measures (Human WRR, ease of understanding; EoU).<b>Supplemental Material S2.</b> Word recognition rates (WRR, %) by speaker for ASR systems and human listeners, with age and dysarthria severity.Choi, J., Moya-Galé, G., Hwang, K., Hirschberg, J., & Levy, E. S. (2026). Automatic speech recognition for intelligibility assessment in children with dysarthria<i>. Journal of Speech, Language, and Hearing Research,</i><i> </i><i>69</i>(4), 1438–1454. https://doi.org/10.1044/2025_JSLHR-25-00562
Automatic speech recognition for childhood dysarthria (Choi et al., 2026)
figshare ASHA Publications · 2026-02-26
datasetOpen accessSenior author<b>Purpose: </b>Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners’ orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners’ judgments.<b>Method: </b>Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners’ transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings.<b>Results: </b>The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems’ WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings.<b>Conclusions: </b>ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications.<b>Supplemental Material S1.</b> Raw and multiple-comparison–adjusted <i>p</i>-values (Holm–Bonferroni) for 10 Spearman correlations between ASR word recognition rates (WRR, %) and human perceptual measures (Human WRR, ease of understanding; EoU).<b>Supplemental Material S2.</b> Word recognition rates (WRR, %) by speaker for ASR systems and human listeners, with age and dysarthria severity.Choi, J., Moya-Galé, G., Hwang, K., Hirschberg, J., & Levy, E. S. (2026). Automatic speech recognition for intelligibility assessment in children with dysarthria<i>. Journal of Speech, Language, and Hearing Research,</i><i> </i><i>69</i>(4), 1438–1454. https://doi.org/10.1044/2025_JSLHR-25-00562
Automatic Speech Recognition for Intelligibility Assessment in Children With Dysarthria
Journal of Speech Language and Hearing Research · 2026-02-26
articleSenior authorPURPOSE: Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners' orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners' judgments. METHOD: Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners' transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings. RESULTS: The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems' WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings. CONCLUSIONS: ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.31397457.
International Journal of Speech-Language Pathology · 2025-10-24
articleSenior authorPURPOSE: Children with dysarthria due to cerebral palsy often face barriers to receiving speech-language pathology services. Using online videoconferencing from home could be an appropriate solution if audio-recordings from such technology yield valid measures of the children's speech. This study assessed the validity of acoustic measures obtained from online recordings of children with dysarthria from their homes. METHOD: Speech of 17 children with dysarthria was recorded from their homes simultaneously via two methods: 1) Online via Zoom and 2) offline via an audio-recording device. Nine commonly-assessed acoustic measures were obtained by each method and compared. Correlations and agreements between measures extracted from online and audio-device recordings were evaluated for whether they met predetermined criteria for validity. RESULT: Second-formant range of diphthongs, fricative-affricate duration difference, word duration/articulation rate, mean fundamental frequency, and sound-pressure-level range met the criteria for validity. In contrast, fundamental frequency range, signal-to-noise ratio, and cepstral peak prominence did not meet validity criteria. CONCLUSION: Findings support the validity of most commonly-analysed acoustic measures extracted from online recordings of children with dysarthria, suggesting that commercially-available videoconferencing technology could be an alternative to in-person evaluation. However, for perturbation- and noise-based measures, in-person recordings may still be necessary.
Production of American English consonants /v/ and /w/ by Hindi speakers of English
Journal of Second Language Pronunciation · 2025-06-10
articleSenior authorAbstract Previous research revealed that Hindi speakers identify American English (AE) phonemes /v/ and /w/ with only chance accuracy. Building on these findings, this study explored the production of AE /v/ and /w/ by Hindi speakers, utilizing both acoustic analysis of second formant (F2) onset and AE listeners’ ratings. Participants included two groups of Hindi-English bilinguals, one residing in the US for more than 5 years, one residing in India, and a group of monolingual AE speakers. Results indicated significant differences in F2 onset between AE speakers and Hindi groups, with AE speakers differentiating the consonants more than the Hindi speakers did. The F2 onset of the Hindi speakers who had resided in the US differed from the F2 onsets produced by those with no AE immersion experience in certain conditions only. AE listeners rated only a few productions from Hindi speakers as accurate representations of AE /v/ and /w/. AE /v/-/w/ is difficult for Hindi speakers to produce contrastively, even for those who have resided in the US for several years.
Journal of Speech Language and Hearing Research · 2024-04-04 · 3 citations
articleSenior authorPURPOSE: Reduced speech intelligibility is often a hallmark of children with dysarthria secondary to cerebral palsy (CP), but effects of speech strategies for increasing intelligibility are understudied, especially in children who speak languages other than English. This study examined the effects of (the Korean translation of) two cues, "speak with your big mouth" and "speak with your strong voice," on speech acoustics and intelligibility of Korean-speaking children with CP. METHOD: Fifteen Korean-speaking children with CP repeated words and sentences in habitual, big mouth, and strong voice conditions. Acoustic analyses were performed and intelligibility was assessed by means of 90 blinded listeners' ease-of-understanding (EoU) ratings and percentage of words correctly transcribed (PWC). RESULTS: In response to both cues, children's vocal intensity and utterance duration increased significantly and differentially, whereas their vowel space area gains did not reach statistical significance. EoU increased significantly in the big mouth condition at word, but not sentence, level, whereas in the strong voice condition, EoU increased significantly at both levels. PWC increases were not statistically significant. Considerable variability in children's responses to cues was noted overall. CONCLUSIONS: Korean-speaking children with CP modify their speech styles differentially when provided with cues aimed to increase their articulatory working space and vocal intensity. The results provide preliminary support for the use of the strong voice cue, in particular, to increase EoU. While the findings do not offer conclusive evidence of the intelligibility benefits of these cues, investigation with a larger sample size should provide further insight into optimal cueing strategies for increasing intelligibility in this population. Implications for language-specific versus language-independent treatment approaches are discussed. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.25521052.
American Journal of Speech-Language Pathology · 2024-04-01 · 3 citations
articleOpen accessSenior authorPURPOSE: International cleft lip and palate surgical charities recognize that speech therapy is essential for successful care of individuals after palate repair. The challenge is how to ensure that cleft speech interventionists (i.e., speech-language pathologists and other speech therapy providers) provide quality care. This exploratory study investigated effects of a two-stage cleft training in Oaxaca, Mexico, aimed at preparing speech interventionists to provide research-based services to individuals born with cleft palate. Changes in the interventionists' content knowledge and clinical skills were examined. METHOD: Twenty-three cleft speech interventionists from Mexico, Guatemala, and Nicaragua participated in a hybrid two-stage training, completing an online Spanish cleft speech course and a 5-day in-person training in Oaxaca. In-person training included a didactic component and supervised clinical practice with 14 individuals with repaired cleft palates. Testing of interventionists' content knowledge and clinical skills via questionnaires occurred before the online course (Test 1), immediately before in-person training (Test 2), and immediately after in-person training (Test 3). Qualitative data on experience/practice were also collected. RESULTS: Significant increases in interventionists' overall content knowledge and clinical skills were found posttraining. Knowledge and clinical skills increased significantly between Tests 1 and 2. Clinical skills, but not knowledge, showed further significant increases between Tests 2 and 3. Posttraining, interventionists demonstrated greater expertise in research-based treatment, and fewer reported they would use nonspeech oral motor exercises (NSOME). CONCLUSIONS: Findings provide preliminary support for such two-stage international trainings in preparing local speech interventionists to deliver high-quality speech services to individuals born with cleft palate. While content knowledge appears to be acquired primarily from the online course, the two-stage training incorporating in-person supervised practice working with individuals born with cleft palate may best enhance continued clinical skill development, including replacement of NSOME with evidence-based speech treatment. Such trainings contribute to building capacity for sustainable quality services for this population in underresourced regions.
Revisiting Dysarthria Treatment Across Languages: The Hybrid Approach
Journal of Speech Language and Hearing Research · 2023-12-06 · 11 citations
article1st authorCorrespondingPURPOSE: Ten years after Miller and Lowit's (2014) groundbreaking book providing a cross-linguistic perspective on motor speech disorders, we ask where we are regarding dysarthria treatment across languages in two specific populations: adults with Parkinson's disease (PD) and children with cerebral palsy (CP). METHOD: In this commentary, we consider preliminary evidence for both language-independent and language-specific approaches to treatment and propose a hybrid approach to speech treatment across languages, centered on the individual with dysarthria who speaks any given language. CONCLUSIONS: Treatment research on individuals with dysarthria secondary to PD and CP is advancing, but several areas remain to be explored. Next steps are suggested for addressing the paucity and complexity of cross-linguistic speech treatment research.
Journal of Speech Language and Hearing Research · 2022-12-12 · 5 citations
articleSenior authorPurpose: The purpose of this study was to examine selected baseline acoustic features of hypokinetic dysarthria in Spanish speakers with Parkinson's disease (PD) and identify potential acoustic predictors of ease of understanding in Spanish. Method: Seventeen Spanish-speaking individuals with mild-to-moderate hypokinetic dysarthria secondary to PD and eight healthy controls were recorded reading a translation of the Rainbow Passage. Acoustic measures of vowel space area, as indicated by the formant centralization ratio (FCR), envelope modulation spectra (EMS), and articulation rate were derived from the speech samples. Additionally, 15 healthy adults rated ease of understanding of the recordings on a visual analogue scale. A multiple linear regression model was implemented to investigate the predictive value of the selected acoustic parameters on ease of understanding. Results: Listeners' ease of understanding was significantly lower for speakers with dysarthria than for healthy controls. The FCR, EMS from the first 10 s of the reading passage, and the difference in EMS between the end and the beginning sections of the passage differed significantly between the two groups of speakers. Findings indicated that 67.7% of the variability in ease of understanding was explained by the predictive model, suggesting a moderately strong relationship between the acoustic and perceptual domains. Conclusions: Measures of envelope modulation spectra were found to be highly significant model predictors of ease of understanding of Spanish-speaking individuals with hypokinetic dysarthria associated with PD. Articulation rate was also found to be important (albeit to a lesser degree) in the predictive model. The formant centralization ratio should be further examined with a larger sample size and more severe dysarthria to determine its efficacy in predicting ease of understanding.
International Journal of Language & Communication Disorders · 2022-04-01 · 10 citations
articleBACKGROUND: Individuals with developmental dysarthria typically demonstrate reduced functioning of one or more of the speech subsystems, which negatively impacts speech intelligibility and communication within social contexts. A few treatment approaches are available for improving speech production and intelligibility among individuals with developmental dysarthria. However, these approaches have only limited application and research findings among adolescents and young adults. AIMS: To determine and compare the effectiveness of two treatment approaches, the modified Speech Intelligibility Treatment (mSIT) and the Beatalk technique, on speech production and intelligibility among Hebrew-speaking adolescents and young adults with developmental dysarthria. METHODS & PROCEDURES: Two matched groups of adolescents and young adults with developmental dysarthria participated in the study. Each received one of the two treatments, mSIT or Beatalk, over the course of 9 weeks. Measures of speech intelligibility, articulatory accuracy, voice and vowel acoustics were assessed both pre- and post-treatment. OUTCOMES & RESULTS: Both the mSIT and Beatalk groups demonstrated gains in at least some of the outcome measures. Participants in the mSIT group exhibited improvement in speech intelligibility and voice measures, while participants in the Beatalk group demonstrated increased articulatory accuracy and gains in voice measures from pre- to post-treatment. Significant increases were noted post-treatment for first formant values for select vowels. CONCLUSIONS & IMPLICATIONS: Results of this preliminary study are promising for both treatment approaches. The differentiated results indicate their distinct application to speech intelligibility deficits. The current findings also hold clinical significance for treatment among adolescents and young adults with motor speech disorders and application for a language other than English. WHAT THIS PAPER ADDS: What is already known on the subject Developmental dysarthria (e.g., secondary to cerebral palsy) is a motor speech disorder that negatively impacts speech intelligibility, and thus communication participation. Select treatment approaches are available with the aim of improving speech intelligibility in individuals with developmental dysarthria; however, these approaches are limited in number and have only seldomly been applied specifically to adolescents and young adults. What this paper adds to existing knowledge The current study presents preliminary data regarding two treatment approaches, the mSIT and Beatalk technique, administered to Hebrew-speaking adolescents and young adults with developmental dysarthria in a group setting. Results demonstrate the initial effectiveness of the treatment approaches, with different gains noted for each approach across speech and voice domains. What are the potential or actual clinical implications of this work? The findings add to the existing literature on potential treatment approaches aiming to improve speech production and intelligibility among individuals with developmental dysarthria. The presented approaches also show promise for group-based treatments as well as the potential for improvement among adolescents and young adults with motor speech disorders.
Recent grants
NIH · $24k · 2004
Frequent coauthors
- 22 shared
Gemma Moya‐Galé
Columbia University
- 21 shared
Mira Goral
Lehman College
- 19 shared
Megan J. McAuliffe
- 13 shared
Loraine K. Obler
- 13 shared
Winifred Strange
The Graduate Center, CUNY
- 12 shared
Valeriy Shafiro
Rush University
- 11 shared
Younghwa M. Chang
Google (United States)
- 10 shared
D. H. Whalen
City University of New York
Labs
Speech Production and Perception LabPI
The lab focuses on the study of speech production and perception, including treatment efficacy for motor speech disorders, cross-language speech perception, and bilingual lexicon.
Education
Ph.D., Speech and Hearing Sciences
CUNY Graduate Center
M.A., Speech-Language Pathology
Lehman College
M.A., Linguistics
New York University
B.A., Psychology
Wesleyan University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Erika Shield Levy
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup