Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Sarah Moeller

Sarah Moeller

· Ph.D.Verified

University of Florida · Linguistics

Active 1998–2026

h-index13
Citations518
Papers4218 last 5y
Funding
See your match with Sarah Moeller — sign in to PhdFit.Sign in

About

Sarah Moeller is an Assistant Professor of Computational Language Science at the University of Florida, based in Turlington 4129, Gainesville, FL. She is an interdisciplinary researcher dedicated to bridging the technological and knowledge gaps between natural language processing (NLP) and traditional linguistics. Her research fosters a virtuous cycle where NLP contributes to the scientific study of minority languages, and in turn, the study of these languages expands NLP and AI to new linguistic contexts. Her special interest lies in the languages of the former Soviet Union, where she has engaged in fieldwork with Nakh-Daghestanian languages and encountered language endangerment firsthand. Moeller's work explores how best practices in linguistics might evolve with AI integration and how computational linguistics can empower both academic and community linguists to leverage AI positively. Before her academic career, she gained extensive experience teaching English as a Foreign Language, working as a freelance Russian-English interpreter, and spending several years in the NLP industry. She is passionate about helping linguists and individuals from humanities backgrounds adopt computational methods. Her educational background includes a Ph.D. in Linguistics and Cognitive Science from the University of Colorado Boulder, an M.A. in Applied Linguistics from Dallas International University, and a B.A. in History from Thomas Edison State College.

Research topics

  • Computer Science
  • Natural Language Processing
  • Linguistics
  • Artificial Intelligence
  • Programming language
  • Philosophy
  • Epistemology
  • Psychology
  • Cognitive science
  • Engineering

Selected publications

  • Computational Methods for Language Documentation and Description

    Annual Review of Linguistics · 2026-01-30

    articleOpen access1st authorCorresponding

    In this era of rapid artificial intelligence (AI) expansion, computational approaches are reshaping methods for language documentation and description. We survey the history of computational methods that have been applied to research in languages with limited digital resources and also present cutting-edge methods, such as large language models (LLMs), that have the potential to benefit documentary and descriptive fieldwork. We highlight how these methods affect data collection and annotation, transcription and phonological analysis, morphosyntactic description, and translation. Linguists, natural language processing engineers, and speech communities must consider how the use of computational methods such as data mining and machine learning should influence ethical best practices in linguistic field methods and how communities can continue to guide the documentation and maintenance of their languages in the age of AI. Looking forward, LLMs and making computational methods broadly usable through user interfaces are likely to emerge as prominent themes in documentary and descriptive research.

  • Analysis of LLM as a grammatical feature tagger for African American English

    ArXiv.org · 2025-02-09

    preprintOpen access

    African American English (AAE) presents unique challenges in natural language processing (NLP). This research systematically compares the performance of available NLP models--rule-based, transformer-based, and large language models (LLMs)--capable of identifying key grammatical features of AAE, namely Habitual Be and Multiple Negation. These features were selected for their distinct grammatical complexity and frequency of occurrence. The evaluation involved sentence-level binary classification tasks, using both zero-shot and few-shot strategies. The analysis reveals that while LLMs show promise compared to the baseline, they are influenced by biases such as recency and unrelated features in the text such as formality. This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics. Data and code are available.

  • Analysis of LLM as a grammatical feature tagger for African American English

    2025-01-01

    articleOpen accessSenior author

    African American English (AAE) presents unique challenges in natural language processing (NLP) This research systematically compares the performance of available NLP models-rule-based, transformer-based, and large language models (LLMs)-capable of identifying key grammatical features of AAE, namely Habitual Be and Multiple Negation.These features were selected for their distinct grammatical complexity and frequency of occurrence.The evaluation involved sentencelevel binary classification tasks, using both zero-shot and few-shot strategies.The analysis reveals that while LLMs show promise compared to the baseline, they are influenced by biases such as recency and unrelated features in the text such as formality.This study highlights the necessity for improved model training and architectural adjustments to better accommodate AAE's unique linguistic characteristics.Data and code are available.

  • Challenges in Processing Chinese Texts Across Genres and Eras

    2025-01-01

    articleOpen accessSenior author

    Pre-trained Chinese Natural Language Processing (NLP) tools show reduced performance when analyzing poetry compared to prose.This study investigates the discrepancies between tools trained on either Classical or Modern Chinese prose when handling Classical Chinese prose and Classical Chinese poetry.Three experiments reveal error patterns that indicate the weaker performance on Classical Chinese poems is due to challenges identifying word boundaries.Specifically, tools trained on Classical prose struggle recognizing word boundaries within Classical poetic structures and tools trained on Modern prose have difficulty with word segmentation in both Classical Chinese genres.These findings provide valuable insights into the limitations of current NLP tools for studying Classical Chinese literature.

  • A Meeting of Two Worlds—Oral History and Linguistics: Partnerships, Perplexities, and Potentialities in Researching African American Language

    The Oral History Review · 2025-07-03

    articleSenior author
  • Front Matter

    2024-01-01

    articleOpen access1st authorCorresponding

    Sarah Moeller, Godfred Agyapong, Antti Arppe, Aditi Chaudhary, Shruti Rijhwani, Christopher Cox, Ryan Henke, Alexis Palmer, Daisy Rosenblum, Lane Schwartz. Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages. 2024.

  • Machine-in-the-Loop with Documentary and Descriptive Linguists

    2024-01-01

    articleOpen access1st authorCorresponding

    This paper describes a curriculum for teaching linguists how to apply machine-in-the-loop (MitL) approach to documentary and descriptive tasks.It also shares observations about the learning participants, who are primarily noncomputational linguists, and how they interact with the MitL approach.We found that they prefer cleaning over increasing the training data and then proceed to reanalyze their analytical decisions, before finally undertaking small actions that emphasize analytical strategies.Overall, participants display an understanding of the curriculum which covers fundamental concepts of machine learning and statistical modeling.

  • Leveraging syntactic dependencies in disambiguation: the case of African American English

    2024-04-01 · 1 citations

    preprintOpen accessSenior author

    African American English (AAE) has received recent attention in the field of natural language processing (NLP). Efforts to address bias against AAE in NLP systems tend to focus on lexical differences. Whenever the structural uniqueness of AAE is considered, the solution is often to remove or neutralize the differences. This work leverages knowledge about the unique morphosyntactic structures to improve automatic disambiguation of habitual and nonhabitual meanings of “be” in naturally produced AAE transcribed speech. Both meanings are employed in AAE but examples of Habitual be are rare in the already limited AAE data. Generally, representing contextual syntactic information improves semantic disambiguation of habituality. Using an ensemble of classical machine learning models with a representation of the unique POS and dependency patterns of Habitual be, we show that integrating syntactic information improves the identification of habitual uses of “be” by about 65 F1 points over a simple baseline model of n-grams, and as much as 74 points. The success of this approach demonstrates the potential impact when weembrace, rather than neutralize, the structural uniqueness of African American English.

  • A Comparison of Fine-Tuning and In-Context Learning for Clause-Level Morphosyntactic Alternation

    2024-01-01

    articleOpen access

    This paper presents our submission to the AmericasNLP 2024 Shared Task on the Creation of Educational Materials for Indigenous Languages.We frame this task as one of morphological inflection generation, treating each sentence as a single word.We investigate and compare two distinct approaches: fine-tuning neural encoder-decoder models such as NLLB-200, and in-context learning with proprietary large language models (LLMs).Our findings demonstrate that for this task, no one approach is perfect.Anthropic's Claude 3 Opus, when supplied with grammatical description entries, achieves the highest performance on Bribri among the evaluated models.This outcome corroborates and extends previous research exploring the efficacy of in-context learning in lowresource settings.For Maya, fine-tuning NLLB-200-3.3B using StemCorrupt augmented data yielded the best performance.

  • The Bangla/Bengali Seed Dataset Submission to the WMT24 Open Language Data Initiative Shared Task

    2024-01-01

    articleOpen accessSenior author

    We contribute a seed dataset for the Bangla/Bengali language as part of the WMT24 Open Language Data Initiative shared task.We validate the quality of the dataset against a mined and automatically aligned dataset (NLLBv1) and two other existing datasets of crowdsourced manual translations.The validation is performed by investigating the performance of state-of-the-art translation models fine-tuned on the different datasets after controlling for training set size.Machine translation models fine-tuned on our dataset outperform models tuned on the other datasets in both translation directions (English-Bangla and Bangla-English).These results confirm the quality of our dataset.We hope our dataset will support machine translation for the Bangla/Bengali community and related low-resource languages.

Frequent coauthors

  • Omri Abend

    30 shared
  • Jakob Prange

    Technische Hochschule Augsburg

    30 shared
  • Austin Blodgett

    DEVCOM Army Research Laboratory

    30 shared
  • Vivek Srikumar

    30 shared
  • Nathan Schneider

    Georgetown University

    30 shared
  • Jena D. Hwang

    Allen Institute

    30 shared
  • Adi Bitan

    27 shared
  • Aviram Stern

    University of Utah

    27 shared

Education

  • Ph.D., Linguistics and Cognitive Science

    University of Colorado Boulder

  • M.A., Applied Linguistics

    Dallas International University

  • B.A., History

    Thomas Edison State College

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Sarah Moeller

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup