
H. Andrew Schwartz
· Research Associate ProfessorVerifiedStony Brook University · Computer Science
Active 1966–2025
About
H. Andrew Schwartz is an Associate Professor in the Department of Computer Science at Stony Brook University and the director of the Human Language Analysis Lab (HLAB). He is also a PI and co-founder of the World Well-Being Project, a multidisciplinary consortium involving the University of Pennsylvania, Stony Brook University, and Stanford University, focused on developing large-scale language analyses to reveal differences in health, personality, and well-being. Schwartz received his PhD in computer science from the University of Central Florida in 2011 and previously served as a postdoctoral fellow and lead research scientist at the University of Pennsylvania. His research utilizes natural language processing and machine learning techniques to advance the state of the art in AI-driven language modeling, emphasizing modeling language within human, social, cognitive, and temporal contexts. Additionally, his work investigates language and speech as a window into the human condition, including mental health, fundamental human traits, and behavioral motives. Schwartz is an active contributor to the fields of AI-natural language processing, health informatics, and psychology. He has been recognized with awards such as the 2020 DARPA Young Faculty Award and the 2022 Research Excellence Award from Stony Brook CS. He is also involved in public service groups like the UN Global Working Group on Big Data for Official Statistics and has contributed to the development of tools such as the R-Text package and the Python Differential Language Analysis ToolKit (DLATK), used in over 100 studies and by various tech companies.
Research topics
- Computer Science
- Sociology
- Natural Language Processing
- World Wide Web
- Psychology
- Social Science
- Social psychology
- Artificial Intelligence
- Data science
- Machine Learning
- Statistics
- Epistemology
- Engineering
- Cartography
- Mathematics
- Geography
- Cognitive science
- Linguistics
- Demography
- Econometrics
- Cognitive psychology
Selected publications
Day-to-day dynamics of facial emotion expressions in posttraumatic stress disorder
Journal of Affective Disorders · 2025-03-22
articleComprehensive Psychiatry · 2025-05-13 · 3 citations
articleOpen accessAccurate assessments of symptoms and illnesses are essential for health research and clinical practice but face many challenges. The absence of a single error-free measure is currently addressed by assessment methods involving experts reviewing several sources of information to achieve a best-estimate assessment . This assessment method is called the Expert Panel method in medicine, and the Best-Estimate Diagnosis or Longitudinal Expert All Data (LEAD) method in psychiatry and psychology. However, due to poor reporting of the assessment method, the quality of pro-claimed best-estimate assessments is typically difficult to evaluate, and when the method is reported, the reporting quality varies substantially. To tackle this gap, we have developed a reporting guideline following a four-stage approach: 1) drafting reporting standards accompanied by empirical evidence, which were further developed with a patient organization for depression, 2) incorporating expert feedback through a two-round Delphi procedure, 3) refining the guideline based on an expert consensus meeting, and 4) testing the guideline by i) having researchers test it and ii) applying it to previously published studies. The last step also provides evidence for the need for the guideline: 10–63 % (Mean 33 %) of the standards were not reported across thirty randomly selected previously published studies. The result is the LEADING guideline comprising 20 reporting standards in four groups: the Longitudinal design , the Appropriate data , the Evaluation – experts, materials and procedures , and the Validity group. We hope that the LEADING guideline will assist researchers in planning, conducting, reporting, and evaluating research aiming to achieve best-estimate assessments. • Similar assessment methods across psychiatry, medicine, and psychology involve experts reviewing several sources of (longitudinal) information to achieve best-estimate assessments . • However, the quality of these assessments is difficult to evaluate due to poor reporting of the assessment method, and when the method is reported, the reporting quality varies substantially. • To tackle this gap, we have developed the LEADING guideline including 20 reporting standards related to four groups: The Longitudinal design , the Appropriate data , the Evaluation – experts, materials and procedures , and the Validity group. • Applying the guideline to 30 randomly selected previously published studies shows that 10–63 % (Mean 33 %) of the standards were not reported, which shows the need for a guideline. • We hope that the LEADING guideline will support researchers in planning, reporting, and evaluating research that aims to achieve best-estimate assessments.
Day-to-day dynamics of facial emotion expressions in posttraumatic stress disorder
2025-02-26
preprintFacial expressions are an essential component of emotions that may reveal mechanisms maintaining posttraumatic stress disorder (PTSD). However, most research on emotions in PTSD has relied on self-reports, which only capture subjective affect. The few studies on outward emotion expressions have been hampered by methodological limitations, including low ecological validity and failure to capture the dynamic nature of emotions and symptoms. Our study addresses these limitations with an approach that has not been applied to psychopathology: person-specific models of day-to-day facial emotion expression and PTSD symptom dynamics. We studied a sample of World Trade Center responders (N=112) with elevated PTSD pathology who recorded a daily video diary and self-reported symptoms for 90 days (8,953 videos altogether). Facial expressions were detected from video recordings with a facial emotion recognition model. In data-driven, idiographic network models, most participants (80%) had at least one, reliable expression-symptom link. Six expression-symptom dynamics were significant for >10% of the sample. Each of these dynamics had statistically meaningful heterogeneity, with some people’s symptoms related to over-expressivity and others to under-expressivity. Our results provide the foundation for a more complete understanding of emotions in PTSD that not only includes subjective feelings but also outward emotion expressions.
ArXiv.org · 2025-01-15
preprintOpen accessSenior authorCurrent speech encoding pipelines often rely on an additional text-based LM to get robust representations of human communication, even though SotA speech-to-text models often have a LM within. This work proposes an approach to improve the LM within an audio model such that the subsequent text-LM is unnecessary. We introduce WhiSPA (Whisper with Semantic and Psychological Alignment), which leverages a novel audio training objective: contrastive loss with a language model embedding as a teacher. Using over 500k speech segments from mental health audio interviews, we evaluate the utility of aligning Whisper's latent space with semantic representations from a text autoencoder (SBERT) and lexically derived embeddings of basic psychological dimensions: emotion and personality. Over self-supervised affective tasks and downstream psychological tasks, WhiSPA surpasses current speech encoders, achieving an average error reduction of 73.4% and 83.8%, respectively. WhiSPA demonstrates that it is not always necessary to run a subsequent text LM on speech-to-text output in order to get a rich psychological representation of human communication.
Day-to-day dynamics of facial emotion expressions in posttraumatic stress disorder
2025-02-26
preprintOpen accessFacial expressions are an essential component of emotions that may reveal mechanisms maintaining posttraumatic stress disorder (PTSD). However, most research on emotions in PTSD has relied on self-reports, which only capture subjective affect. The few studies on outward emotion expressions have been hampered by methodological limitations, including low ecological validity and failure to capture the dynamic nature of emotions and symptoms. Our study addresses these limitations with an approach that has not been applied to psychopathology: person-specific models of day-to-day facial emotion expression and PTSD symptom dynamics. We studied a sample of World Trade Center responders (N=112) with elevated PTSD pathology who recorded a daily video diary and self-reported symptoms for 90 days (8,953 videos altogether). Facial expressions were detected from video recordings with a facial emotion recognition model. In data-driven, idiographic network models, most participants (80%) had at least one, reliable expression-symptom link. Six expression-symptom dynamics were significant for >10% of the sample. Each of these dynamics had statistically meaningful heterogeneity, with some people’s symptoms related to over-expressivity and others to under-expressivity. Our results provide the foundation for a more complete understanding of emotions in PTSD that not only includes subjective feelings but also outward emotion expressions.
Assessment · 2025-09-20 · 2 citations
articleOpen accessLarge language models can transform individuals’ mental health descriptions into scores that correlate with rating scales approaching theoretical upper limits. However, such analyses have combined word- and text responses with little known about their differences. We develop response formats ranging from closed-ended to open-ended: (a) select words from lists, write (b) descriptive words, (c) phrases, or (d) texts. Participants answered questions about their depression/worry using the response formats and related rating scales. Language responses were transformed into word embeddings and trained to rating scales. We compare the validity (concurrent, incremental, face, discriminant, and external validity) and reliability (prospective sample and test–retest reliability) of the response formats. Using the Sequential Evaluation with Model Pre-Registration design, machine-learning models were trained on a development dataset ( N = 963), and then pre-registered before tested on a prospective sample ( N = 145). The pre-registered models demonstrate strong validity and reliability, yielding high accuracy in the prospective sample ( r = .60–.79). Additionally, the models demonstrated external validity to self-reported sick-leave/healthcare visits, where the text-format yielded the strongest correlations (being higher/equal to rating scales for 9 of 12 cases). The overall high validity and reliability across formats suggest the possibility of choosing formats according to clinical needs.
Residualized Similarity for Faithfully Explainable Authorship Verification
2025-01-01
articleOpen accessSenior authorResponsible use of authorship verification (AV) systems requires not only high-accuracy but also interpretable solutions.Specifically, for systems to be deployed in contexts where decisions have real-world consequences, their predictions must be explainable through interpretable features that can be traced to the original text.Neural methods achieve high accuracies, but their representations lack direct interpretability.Furthermore, LLM predictions cannot be explained faithfully -if there is an explanation given for a prediction, it doesn't represent the reasoning process behind the model's prediction.To address this gap, we introduce residualized similarity (RS), 1 a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability.Authorship verification is fundamentally a similarity task, where the goal is to measure how likely two documents are to be written by the same author.The key idea is to use a neural network to predict a residual similarity, i.e. the error in the similarity predicted by the interpretable system.Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
ArXiv.org · 2025-09-20
preprintOpen accessSenior authorState-of-the-art automatic speech recognition (ASR) models like Whisper, perform poorly on atypical speech, such as that produced by individuals with dysarthria. Past works for atypical speech have mostly investigated fully personalized (or idiosyncratic) models, but modeling strategies that can both generalize and handle idiosyncracy could be more effective for capturing atypical speech. To investigate this, we compare four strategies: (a) $\textit{normative}$ models trained on typical speech (no personalization), (b) $\textit{idiosyncratic}$ models completely personalized to individuals, (c) $\textit{dysarthric-normative}$ models trained on other dysarthric speakers, and (d) $\textit{dysarthric-idiosyncratic}$ models which combine strategies by first modeling normative patterns before adapting to individual speech. In this case study, we find the dysarthric-idiosyncratic model performs better than idiosyncratic approach while requiring less than half as much personalized data (36.43 WER with 128 train size vs 36.99 with 256). Further, we found that tuning the speech encoder alone (as opposed to the LM decoder) yielded the best results reducing word error rate from 71% to 32% on average. Our findings highlight the value of leveraging both normative (cross-speaker) and idiosyncratic (speaker-specific) patterns to improve ASR for underrepresented speech populations.
2025-09-01
preprintOpen accessLanguage-based assessments (LBAs), quantitative estimates of scientific constructs based on language, have advanced methods in the psychological and social sciences for over a decade. LBAs based on individuals’ prompted descriptions analysed with large language models to produce scores of their psychological states and traits have shown strong convergence with the corresponding rating scales (r > .80) and have often surpassed rating scales in predicting theoretically relevant behaviours (external criteria). Despite their high validity across numerous psychological outcomes and contexts, the broader adoption of LBA models has been limited. Even when made available alongside research publications, these models often remain inaccessible due to technical complexities, inconsistent documentation, and the absence of a standardized repository. This tutorial introduces a framework targeted to social and psychological scientists for accessible sharing models with others –the Language-Based Assessment Models (L-BAM) Library– as well as a toolkit for easily using L–BAMs via the text package in R. L-BAM covers a wide range of models for assessing mental health disorders (e.g., depression, anxiety), well-being (e.g., satisfaction with life, harmony in life), implicit motives (need for power, affiliation and achievement) and more. The L-BAM library aims to increase the availability and resource efficiency of language-based assessments of psychological constructs while encouraging replication, independent validation and the broad application of pre-existing language-based assessment models.
2025-09-24
preprintOpen accessMeasuring the subjective well-being of societies is important in its own right and as a determinant of health outcomes. Traditionally, self-report surveys such as Gallup’s daily poll have tracked well-being, but they are expensive and provide limited coverage of communities. Over the past decade, research has shown that social media (e.g., Twitter/X) offers a cost-effective alternative. Yet, through ownership and policy changes, access to these “digital commons” is increasingly restricted. What are we losing without access for researchers? We argue that well-being estimates derived from geolocated, demographically post-stratified Twitter language provided the most valid indicators of US population well-being available. We compare estimates for 1,208 US counties (~89% of the population) derived from 1.53 billion posts by 5.25 million users to Gallup estimates from 1.9 million survey responses. Twitter-based estimates were more predictive of external health and economic indicators and met a wide variety of validity criteria, including a convergent correlation of r = .70 with Gallup, high test-retest stability, linguistic face validity, and generalizability across US cities, states, and the UK. We further show that Gallup ground truth data is not required by building an independent language model on n = 9,419 Twitter users, which produced valid estimates that again outperformed Gallup in predicting external variables. These findings establish that social media can capture population well-being more robustly and with better coverage than even the largest survey efforts, and that ensuring researcher access to these data is essential for understanding and improving societies.
Recent grants
Frequent coauthors
- 93 shared
Lyle Ungar
California University of Pennsylvania
- 51 shared
Johannes C. Eichstaedt
- 46 shared
Salvatore Giorgi
University of Pennsylvania
- 36 shared
Margaret L. Kern
University of Melbourne
- 30 shared
Oscar Kjell
- 23 shared
Shawndra Hill
- 22 shared
David B. Yaden
Johns Hopkins University
- 22 shared
Patrick Crutchley
Education
- 2015
Visiting Assistant Professor, Computer and Information Science
University of Pennsylvania
Awards & honors
- 2022 Research Excellence Award, Stony Brook CS
- DARPA Young Faculty Award (2020)
- Almetric top 25 most talked about research papers in 2013
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with H. Andrew Schwartz
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup