Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Mark Liberman

Mark Liberman

· Christopher H. Browne Distinguished Professor of Linguistics Phonetics, prosody, natural language processing, speech communicationVerified

University of Pennsylvania · Linguistics

Active 1976–2026

h-index41
Citations9.4k
Papers335124 last 5y
Funding$1.3M
See your match with Mark Liberman — sign in to PhdFit.Sign in

About

Mark Liberman is the Christopher H. Browne Distinguished Professor of Linguistics and Director of the Linguistic Data Consortium at the University of Pennsylvania. He holds professorships in both the Department of Linguistics and the Department of Computer and Information Science, and serves as Faculty Director of Ware College House. His academic work spans a broad range of topics within linguistics and cognitive science, including corpus-based phonetics, the phonology and phonetics of lexical tone and its relationship to intonation, and formal models for linguistic annotation. Additionally, he applies linguistic analysis to legal, medical, and political domains, demonstrating an interdisciplinary approach to language research. Professor Liberman teaches a variety of courses covering introductory linguistics, computational analysis, phonetics, big data in linguistics, and advanced topics such as deep learning and large language models in linguistic research. His research contributions reflect a commitment to integrating computational methods with linguistic theory and practical applications.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Natural Language Processing
  • Speech recognition
  • Psychology
  • Linguistics

Selected publications

  • Automatic detection of autism using large vision-language models: A preliminary analysis.

    2026-01-01

    article

    Robert T. Schultz, & Julia Parish-Morris.

  • Speaker role identification in clinical conversations

    Faculty of 1000 Research Ltd · 2025-01-01

    otherOpen access
  • Social Context Matters for Turn‐Taking Dynamics: A Comparative Study of Autistic and Typically Developing Children

    UNC Libraries · 2025-10-23

    articleOpen access

    Engaging in fluent conversation is a surprisingly complex task that requires interlocutors to promptly respond to each other in a way that is appropriate to the social context. In this study, we disentangled different dimensions of turn-taking by investigating how the dynamics of child-adult interactions changed according to the activity (task-oriented vs. freer conversation) and the familiarity of the interlocutor (familiar vs. unfamiliar). Twenty-eight autistic children (16 male; M a g e $M_{age}$  = 10.8 years) and 20 age-matched typically developing children (8 male; M a g e $M_{age}$  = 9.6 years) participated in seven task-orientated face-to-face conversations with their caregivers (336 total conversations) and seven more telephone conversations alternately with their caregivers (144 total conversations, 60 with the typical development group) and an experimenter (191 total conversations, 112 with the autism group). By modeling inter-turn response latencies in multi-level Bayesian location-scale models, we found that inter-turn response latencies were consistent across repeated measures within social contexts, but exhibited substantial differences across social contexts. Autistic children exhibited more overlaps, produced faster response latencies and shorter pauses than typically developing children-and these group differences were stronger when conversing with the unfamiliar experimenter. Unfamiliarity also made the relation between individual differences and latencies evident: only in conversations with the experimenter were higher sociocognitive skills and lower social awareness associated with faster responses. Information flow and shared tempo were also influenced by familiarity: children adapted their response latencies to the predictability and tempo of their interlocutor's turn, but only when interacting with their caregivers and not the experimenter. These results highlight the need to construe turn-taking as a multicomponential construct that is shaped by individual differences, interpersonal dynamics, and the affordances of the context.

  • Relation between Depression Dimensions and Speech Acoustic and Emotion‐based Features

    Alzheimer s & Dementia · 2025-12-01

    articleOpen access

    BACKGROUND: Depression is a common and highly heterogeneous disorder in older adults, often linked to faster cognitive decline. While standard questionnaires are subjective, speech analysis may offer a more objective method for characterizing depression. This study investigates the relationship between speech features and depression dimensions in participants at Mount Sinai Alzheimer's Disease Research Center (ADRC). METHOD: Participants included healthy controls (n = 31) and individuals with mild cognitive impairment (MCI= 22) and Alzheimer's Disease (AD, n = 16). They described three pictures with neutral, negative and positive themes. Speech features were analyzed using automated pipelines and emotion-based variables and acoustic features were used for analysis. Depression dimensions (dysphoria, apathy, hopelessness, and memory complaints) were assessed based on Geriatric Depression Scale-15 (GDS-15). Mixed model regression was used to assess the relationship of depression dimensions and the emotional nature of the pictures. RESULT: The 73 participants (41% male, average age 80.14±8.01) showed that dysphoria and apathy had opposite associations with emotion-based measures. Dysphoria was associated with higher valence (more positive emotion), while apathy to lower valence. Subjective memory complaint was also associated with lower valence words. Further analysis revealed that apathy was associated with lower pitch and slower speech when describing negative pictures, and dysphoria with a wider pitch range and faster speech for negative pictures. Patients with memory complaints used a narrower pitch range in both positive and negative tasks. CONCLUSION: Dysphoria was associated with heightened emotional reactivity, while apathy showed decreased reactivity. Apathy and memory complaints shared similar speech features. Our preliminary results support the use of speech features in distinguishing different depression dimensions even at the subclinical level, offering promising opportunities for use of technology to enhance both understanding and diagnosis of depression.

  • Social Context Matters for Turn‐Taking Dynamics: A Comparative Study of Autistic and Typically Developing Children

    Cognitive Science · 2025-10-01 · 3 citations

    articleOpen access

    Abstract Engaging in fluent conversation is a surprisingly complex task that requires interlocutors to promptly respond to each other in a way that is appropriate to the social context. In this study, we disentangled different dimensions of turn‐taking by investigating how the dynamics of child–adult interactions changed according to the activity (task‐oriented vs. freer conversation) and the familiarity of the interlocutor (familiar vs. unfamiliar). Twenty‐eight autistic children (16 male; = 10.8 years) and 20 age‐matched typically developing children (8 male; = 9.6 years) participated in seven task‐orientated face‐to‐face conversations with their caregivers (336 total conversations) and seven more telephone conversations alternately with their caregivers (144 total conversations, 60 with the typical development group) and an experimenter (191 total conversations, 112 with the autism group). By modeling inter‐turn response latencies in multi‐level Bayesian location‐scale models, we found that inter‐turn response latencies were consistent across repeated measures within social contexts, but exhibited substantial differences across social contexts. Autistic children exhibited more overlaps, produced faster response latencies and shorter pauses than typically developing children—and these group differences were stronger when conversing with the unfamiliar experimenter. Unfamiliarity also made the relation between individual differences and latencies evident: only in conversations with the experimenter were higher sociocognitive skills and lower social awareness associated with faster responses. Information flow and shared tempo were also influenced by familiarity: children adapted their response latencies to the predictability and tempo of their interlocutor's turn, but only when interacting with their caregivers and not the experimenter. These results highlight the need to construe turn‐taking as a multicomponential construct that is shaped by individual differences, interpersonal dynamics, and the affordances of the context.

  • Age distribution of speech duration measures in healthy individuals with picture description tasks

    Alzheimer s & Dementia · 2025-12-01

    articleOpen access

    BACKGROUND: Picture description tasks have been used to gauge communication capabilities of neurodegenerative patients. Previous studies have demonstrated that speech duration measures can help distinguish between healthy and neurodegenerative individuals, or even among patients of different neurodegenerative diseases. Hence, it is crucial to establish the baseline of those measures in healthy individuals in a wide age range, as deviations could indicate neurodegeneration. We collected picture description task responses from healthy volunteers to quantify such a baseline, which can aid early detection of neurodegenerative diseases. METHODS: There were 290 healthy participants with a wide distribution of ages, from 15 to 90 years (M=49.9, SD=18.2), who voluntarily participated in picture description tasks online. The pictures included the Cookie Theft scene, the Picnic scene, and two similar pictures we designed; some participants completed all four, while others completed only some of them (M=2.6, SD=1.3). An in-house speech activity detector program was employed to segment audio files into speech and pause segments automatically. We built linear mixed-effects models to examine the effects of age on quantitative speech duration measures, including mean speech and pause segment durations, speech percentage (proportion of speech in the entire recording time), and pause rate (average pause count per minute). Sex, education levels, and picture types were included as fixed effects, and participant IDs as random effects. RESULTS: Older participants showed lower speech percentages (β=-0.079, SE=0.021, p <.001) and lower mean speech durations (β=-0.004, SE=0.002, p = .022), but the decline failed to replicate in total speech duration (β=-0.051, SE=0.079, p = .512). Meanwhile, pause rates increased with age (β=0.065, SE=0.022, p = .003), as with mean pause durations (β=0.002, SE=0.000, p = .001) and total pause durations (β=0.059, SE=0.023, p = .011). Total durations did not yield significant results (β=0.007, SE=0.095, p = .943). CONCLUSION: We demonstrated specific ageing trends across various speech duration measures. The different pictures and the demographic variance of participants support the reliability of our results. These outcomes contribute to the understanding of how speech duration patterns change with age, establishing a baseline to estimate the deviation from healthy ageing at different ages.

  • Evaluating Speech-to-Text Systems with PennSound

    ArXiv.org · 2025-04-08

    preprintOpen access

    A random sample of nearly 10 hours of speech from PennSound, the world's largest online collection of poetry readings and discussions, was used as a benchmark to evaluate several commercial and open-source speech-to-text systems. PennSound's wide variation in recording conditions and speech styles makes it a good representative for many other untranscribed audio collections. Reference transcripts were created by trained annotators, and system transcripts were produced from AWS, Azure, Google, IBM, NeMo, Rev.ai, Whisper, and Whisper.cpp. Based on word error rate, Rev.ai was the top performer, and Whisper was the top open source performer (as long as hallucinations were avoided). AWS had the best diarization error rates among three systems. However, WER and DER differences were slim, and various tradeoffs may motivate choosing different systems for different end users. We also examine the issue of hallucinations in Whisper. Users of Whisper should be cautioned to be aware of runtime options, and whether the speed vs accuracy trade off is acceptable.

  • Speaker Role Identification in Clinical Conversations

    2025-12-01

    articleOpen access

    Patient-clinician communication research is crucial for understanding interaction dynamics and for predicting outcomes that are associated with clinical discourse. Traditionally, interaction analysis is conducted manually because of challenges such as Speaker Role Identification (SRI), which must reliably differentiate between doctors, medical assistants, patients, and other caregivers in the same room. Although automatic speech recognition with diarization can efficiently create a transcript with separate labels for each speaker, these systems are not able to assign roles to each person in the interaction. Previous SRI studies in task-oriented scenarios have directly predicted roles using linguistic features, bypassing diarization. However, to our knowledge nobody has investigated SRI in clinical settings. We explored whether Large Language Models (LLMs) such as BERT could accurately identify speaker roles in clinical transcripts, with and without diarization. We used veridical turn segmentation and diarization identifiers, fine-tuning each model at varying levels of identifier corruption to assess impact on performance. Our results demonstrate that BERT achieves high performance with linguistic signals alone (82% accuracy/82% F1-score), while incorporating accurate diarization identifiers further enhances accuracy (95%/95%). We conclude that fine-tuned LLMs are effective tools for SRI in clinical settings.

  • Decoding Dementia from Speech: Acoustic‐Lexical Integration for Detecting Alzheimer's Disease in Older Korean Adults

    Alzheimer s & Dementia · 2025-12-01

    articleOpen access

    BACKGROUND: Early detection of mild cognitive impairment due to Alzheimer's disease (MCI d/t AD), as well as AD dementia (ADD), is critical for timely intervention. Speech analysis offers a non-invasive way to detect subtle cognitive deficits. This study explores the utility of acoustic and lexical features in classifying older Korean adults across three clinical scenarios: (1) HC vs. AD (MCI d/t AD & ADD) for screening, (2) Non-dementia (HC & MCI d/t AD) vs. ADD for detecting advanced pathology, and (3) HC vs. ADD for assessing the most divergent clinical states. We aim to demonstrate the feasibility of speech-based methods for supporting more timely interventions. METHOD: We recruited 110 older Korean adults (HC=55, MCI d/t AD=29, ADD=26). Groups did not differ in gender (p = .372) or education (p = .278). However, the MCI d/t AD group was older (77.79±5.27) than the HC (72.51±6.38) and ADD (73.35±7.48) groups (p = .002), whereas there was no significant difference between HC and ADD. Cognitive measures (MMSE, CDR; both p <.001) differed significantly. All MCI d/t AD and ADD patients were beta-amyloid positive in PET scans. Speech was collected via recording from neuropsychological tests and additional tasks (Korean phonemic/semantic fluency, vowel phonation, picture description). Acoustic and lexical features were extracted with openSMILE (emobase, 988-dimensional) and a pretrained Korean RoBERTa model (768-dimensional). Principal component analysis was applied to each feature set. Three classification models were built using (1) acoustic-only, (2) lexical-only, and (3) an ensemble of acoustic and lexical features. Each model was implemented through a multilayer perceptron and evaluated with 5-fold cross-validation. RESULT: In our experiments, ensemble models outperformed single-feature-based models (Table 1). For HC vs. AD, the ensemble model achieved 75.8% accuracy and 0.756 AUC; for Non-dementia vs. ADD, 85.1% accuracy and 0.801 AUC; and for HC vs. ADD, 87.0% accuracy and 0.893 AUC. Combining acoustic and lexical features provided complementary information, reflecting vocal characteristics and language-based deficits. CONCLUSION: These findings demonstrate that speech-derived features can detect cognitive impairment in older Korean adults across multiple diagnostic scenarios, enabling earlier and more targeted interventions. Moreover, this non-invasive approach may ease clinical workflows and broaden screening accessibility, particularly in resource-limited settings.

  • Automated speech and language markers of longitudinal changes in psychosis symptoms

    NPP—Digital Psychiatry and Neuroscience · 2025-06-17 · 10 citations

    articleOpen access

    Abstract We sought to evaluate the ability of automated speech and language features to longitudinally track fluctuations in the major psychosis domains: Thought Disorder , Negative Symptoms , and Positive Symptoms . Sixty-six participants with psychotic disorders were assessed soon after inpatient admission, at discharge, and at 3- and 6-months. Psychosis symptoms were measured with semi-structured interviews and standardized scales. Recordings were collected from paragraph reading, fluency, picture description, and open-ended tasks. Relationships between psychosis symptoms and 357 automated speech and language features were analyzed using a single component score and as individual features, using linear mixed models. We found that all three domains demonstrated significant longitudinal relationships with the single component score. Thought Disorder was particularly related to features describing more subordinated constructions, less efficient identification of picture elements, and decreased semantic distance between sentences. Negative Symptoms was related to features describing decreased speech complexity. Positive Symptoms domain score did not show relationships with individual features that survived p-value correction, but Suspiciousness was related to decreased use of nouns and Hallucinations was related to greater semantic distances. These relationships were largely robust to interactions with gender and race. Interactions with timepoint revealed variable relationships during different phases of illness (acute vs. stable). In summary, automated speech and language features show promise as scalable, objective markers of psychosis severity. Detailed attention to clinical setting and patient population is needed to optimize clinical translation.

Recent grants

Frequent coauthors

Labs

Education

  • Ph.D., Phonetics, prosody, natural language processing, speech communication

    MIT

    1975
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Mark Liberman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup