Hilke Schellmann

· ProfessorVerified

New York University · History

Active 2022–2026

h-index3

Citations22

Papers55 last 5y

Funding—

Faculty page

See your match with Hilke Schellmann — sign in to PhdFit.Sign in

About

Professor Hilke Schellmann is a faculty member in the Department of Journalism at New York University. She is an Emmy award-winning reporter with a focus on investigating artificial intelligence tools, exploring their impact and the power they hold over many people. Schellmann seeks to find new ways and methods to examine and understand AI technologies, emphasizing their societal implications and influence.

Research topics

Computer Science
Machine Learning
Psychology
Applied psychology
Social psychology
Artificial Intelligence
Natural Language Processing
Accounting
Clinical psychology
Data science
Linguistics
Business
Philosophy
Speech recognition

Selected publications

When Care Becomes Code
Scientific American · 2026-02-17
article1st authorCorresponding
Publisher DOI
The case for stakeholder-driven AI auditing in automatic speech recognition
Nature Machine Intelligence · 2026-03-16
article
Publisher DOI
Careless Whisper: Speech-to-Text Hallucination Harms
arXiv (Cornell University) · 2024 · 68 citations
- Computer Science
- Natural Language Processing
- Linguistics
Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI’s Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper’s transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations—a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.
Publisher OA PDF DOI
An external stability audit framework to test the validity of personality prediction in AI hiring
Data Mining and Knowledge Discovery · 2022 · 28 citations
- Computer Science
- Machine Learning
- Psychology
Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Crucially, rather than challenging or affirming the assumptions made in psychometric testing - that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job - we frame our audit methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. Our main contribution is the development of a socio-technical framework for auditing the stability of algorithmic systems. This contribution is supplemented with an open-source software library that implements the technical components of the audit, and can be used to conduct similar stability audits of algorithmic systems. We instantiate our framework with the audit of two real-world personality prediction systems, namely, Humantic AI and Crystal. The application of our audit framework demonstrates that both these systems show substantial instability with respect to key facets of measurement, and hence cannot be considered valid testing instruments.
Publisher OA PDF DOI
Resume Format, LinkedIn URLs and Other Unexpected Influences on AI Personality Prediction in Hiring: Results of an Audit
2022 · 15 citations
- Computer Science
- Artificial Intelligence
- Machine Learning
Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the reliability of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. We develop a methodology for an external audit of stability of algorithmic personality tests, and instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Rather than challenging or affirming the assumptions made in psychometric testing -- that personality traits are meaningful and measurable constructs, and that they are indicative of future success on the job -- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves.
Publisher OA PDF DOI
An External Stability Audit Framework to Test the Validity of Personality Prediction in AI Hiring
arXiv (Cornell University) · 2022-01-23 · 2 citations
preprintOpen access
Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Our approach is to (a) develop a methodology for an external audit of stability of predictions made by algorithmic personality tests, and (b) instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Crucially, rather than challenging or affirming the assumptions made in psychometric testing -- that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job -- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. Our main contribution is the development of a socio-technical framework for auditing the stability of algorithmic systems. This contribution is supplemented with an open-source software library that implements the technical components of the audit, and can be used to conduct similar stability audits of algorithmic systems. We instantiate our framework with the audit of two real-world personality prediction systems, namely Humantic AI and Crystal. The application of our audit framework demonstrates that both these systems show substantial instability with respect to key facets of measurement, and hence cannot be considered valid testing instruments.
Publisher OA PDF DOI

Frequent coauthors

Mona Sloane
University of Virginia
5 shared
Julia Stoyanovich
New York University
3 shared
Alene K. Rhea
3 shared
Kelsey Markey
New York University
2 shared
Anna Seo Gyeong Choi
Cornell University
2 shared
Lauren D’Arinzo
New York University
2 shared
Allison Koenecke
Cornell University
2 shared
K. Mei
University of Washington
2 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Hilke Schellmann

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you