
Natasha Holmes
· Ann S. Bowers Associate Professor Astronomy, CDER, PhysicsVerifiedCornell University · Physics
Active 2009–2025
About
Natasha Holmes is an Ann S. Bowers Associate Professor in the Department of Physics at Cornell University. Her research focuses on teaching and learning in physics and other STEM courses, exploring how students acquire skills and content knowledge, how different course environments influence student motivation, persistence, and understanding of science and measurement. Her work employs both qualitative and quantitative methods to evaluate variables affecting student learning and experiences in physics and STEM education. Holmes's primary research area is the efficacy of hands-on laboratory courses. She investigates assessment methods to determine what labs achieve and how teaching methods can improve learning outcomes. Her team has developed instruments to assess critical thinking and reasoning about uncertainty and measurement, and they evaluate the impact of various pedagogical strategies, including course redesigns and research experiences. Additionally, her research explores student experiences such as group work dynamics, social biases, and the relationship between coursework and undergraduate research experiences, aiming to inform more effective teaching practices in physics education.
Research topics
- Computer Science
- Mathematics education
- Psychology
- Political Science
- Sociology
- Artificial Intelligence
- Social Science
- Multimedia
- Library science
- Medicine
- Human–computer interaction
- Medical education
Selected publications
Invest in science education research to make science open to all
Nature Physics · 2025-08-25
article1st authorCorrespondingBias in physics peer recognition does not explain gaps in perceived peer recognition
Nature Physics · 2025-03-05 · 5 citations
articleSenior authorStructuring groups for gender equitable equipment usage in labs
Physical Review Physics Education Research · 2025-06-03 · 1 citations
articleOpen accessSenior authorPrevious research has found gender inequitable equipment usage across various lab course contexts. Few studies, however, have tested possible remediation strategies. In this work, we use hierarchical linear modeling to compare men and women’s lab equipment usage in two group work structures across three course contexts. In one in-person course, students formed their own groups in class and rotated into new groups every unit. In the other two courses, one in person and one remote, students were assigned groups formed by the instructor and worked with the same group all semester. In line with former studies, we found gender inequitable equipment usage in the course with in-class formed, rotated groups. We did not observe gender inequitable equipment usage, however, in the course with instructor-assigned, fixed groups. Analyzing equipment usage across the semester within each course, our results suggest that this improvement comes from a combination of both instructor-assigned groups and keeping groups fixed for the semester. Our findings present many opportunities for subsequent controlled studies to probe these practices.
Perceptions of interdisciplinary critical thinking among biology and physics undergraduates
Physical Review Physics Education Research · 2025-04-09 · 4 citations
articleOpen accessSenior authorThere is a growing need for more effective interdisciplinary science instruction across undergraduate degree programs. In addition to supporting students’ connections between disciplinary concepts, interdisciplinary learning can develop students’ critical thinking skills and allow them to evaluate scientific investigations and claims between diverse topics. Physics Education Research literature has particularly focused on introductory physics courses for life sciences students, in part because students majoring in life sciences represent one of the largest demographics enrolled in physics courses. This literature has primarily focused on students’ development of conceptual understanding, modeling skills, and perspectives of the two fields. In this study, we explored how biology and physics undergraduates approach and perceive critical thinking between the two disciplines. We conducted structured think-aloud interviews with biology and physics students, asking students to first complete portions of established biology and physics critical thinking assessments and then respond to several follow-up questions about critical thinking more generally. Using thematic analysis to inductively code interview responses into emergent themes, we found that most students, regardless of major, described different approaches to evaluating biology and physics experiments. However, physics students provided similar definitions of critical thinking in the two disciplines, while biology students provided similar and different definitions in almost equal numbers. The exception was related to the use of quantitative methods solely being associated with critical thinking in physics, despite both critical thinking assessments involving quantitative data analysis. When looking across constructs, we saw no clear trends or relationships between individual students’ responses to each of the interview questions. We also explored students’ broader perspectives on the two fields and found that physics students assume that physics is needed to understand biology but not vice versa, which did not align with their perspectives on critical thinking between disciplines. We use this complexity to motivate future work to understand the impact of biology and physics instruction, as well as other STEM disciplines, on developing students’ critical thinking skills and perceptions.
Comparing large language models for supervised analysis of students’ lab notes
Physical Review Physics Education Research · 2025-03-31 · 6 citations
articleOpen accessSenior authorRecent advancements in large language models (LLMs) hold significant promise for improving physics education research that uses machine learning. In this study, we compare the application of various models for conducting a large-scale analysis of written text grounded in a physics education research classification problem: identifying skills in students’ typed lab notes through sentence-level labeling. Specifically, we use training data to fine-tune two different LLMs, BERT and LLaMA, and compare the performance of these models to both a traditional bag-of-words approach and a few-shot LLM (without fine-tuning). We evaluate the models based on their resource use, performance metrics, and research outcomes when identifying skills in lab notes. We find that higher-resource models often, but not necessarily, perform better than lower-resource models. We also find that all models report similar trends in research outcomes, although the absolute values of the estimated measurements are not always within uncertainties of each other. We use the results to discuss relevant considerations for education researchers seeking to select a model type for use as a classifier.
Dynamics of productive confirmation framing in an introductory lab
Physical Review Physics Education Research · 2024-08-23 · 3 citations
articleOpen accessIn introductory physics laboratory instruction, students often expect to confirm or demonstrate textbook physics concepts. This expectation is largely undesirable: labs that emphasize confirmation of textbook physics concepts are generally unsuccessful at teaching those concepts and even in contexts that do not emphasize confirmation, such expectations can lead to students disregarding or manipulating their data in order to obtain the expected result. In other words, when students expect their lab activities to confirm a known result, they may relinquish epistemic agency and violate disciplinary practices. We present a contrasting case where, we claim, confirmatory expectations can actually support productive disciplinary engagement. In this case study, we analyze the complex dynamics of students’ epistemological framing in a lab where students’ confirmatory expectations support and even generate epistemic agency and disciplinary practices, including developing original ideas, measures, and apparatuses to apply to the material world. Published by the American Physical Society 2024
Method to assess the trustworthiness of machine coding at scale
Physical Review Physics Education Research · 2024-03-06 · 3 citations
articleOpen accessSenior authorPhysics education researchers are interested in using the tools of machine learning and natural language processing to make quantitative claims from natural language and text data, such as open-ended responses to survey questions. The aspiration is that this form of machine coding may be more efficient and consistent than human coding, allowing much larger and broader datasets to be analyzed than is practical with human coders. Existing work that uses these tools, however, does not investigate norms that allow for trustworthy quantitative claims without full reliance on cross-checking with human coding, which defeats the purpose of using these automated tools. Here we propose a four-part method for making such claims with supervised natural language processing: evaluating a trained model, calculating statistical uncertainty, calculating systematic uncertainty from the trained algorithm, and calculating systematic uncertainty from novel data sources. We provide evidence for this method using data from two distinct short response survey questions with two distinct coding schemes. We also provide a real-world example of using these practices to machine code a dataset unseen by human coders. We offer recommendations to guide physics education researchers who may use machine-coding methods in the future. Published by the American Physical Society 2024
Applying machine learning models in multi-institutional studies can generate bias
2024-09-12
articleOpen accessSenior authorThere is increasing interest in deploying machine learning models at scale for multi-institutional studies in physics education research.Here we investigate the efficacy of applying machine learning models to institutions outside of their training set, using natural language processing to code open-ended survey responses.We find that, in general, changing institutional contexts can affect machine learning estimates of code frequencies: either previously documented sources of uncertainty increase in magnitude, new unknown sources of uncertainty emerge, or both.We also find an example where uncertainties do not change between the institution used in the training data and an institution not in the training data.Results suggest that attention to uncertainty is critical, especially when making measurements of student writing across multi-institutional data sets.
What topics of peer interactions correlate with student performance in physics courses?
European Journal of Physics · 2024-03-19 · 2 citations
articleOpen accessSenior authorCorrespondingAbstract Research suggests that interacting with more peers about physics course material is correlated with higher student performance. Some studies, however, have demonstrated that different topics of peer interactions may correlate with their performance in different ways, or possibly not at all. In this study, we probe both the peers with whom students interact about their physics course and the particular aspects of the course material about which they interacted in six different introductory physics courses: four lecture courses and two lab courses. Drawing on social network analysis methods, we replicate prior work demonstrating that, on average, students who interact with more peers in their physics courses have higher final course grades. Expanding on this result, we find that students discuss a wide range of aspects of course material with their peers: concepts, small-group work, assessments, lecture, and homework. We observe that in the lecture courses, interacting with peers about concepts is most strongly correlated with final course grade, with smaller correlations also arising for small-group work and homework. In the lab courses, on the other hand, small-group work is the only interaction topic that significantly correlates with final course grade. We use these findings to discuss how course structures (e.g. grading schemes and weekly course schedules) may shape student interactions and add nuance to prior work by identifying how specific types of student interactions are associated (or not) with performance.
Comparing large language models for supervised analysis of students' lab notes
arXiv (Cornell University) · 2024-12-13
preprintOpen accessSenior authorRecent advancements in large language models (LLMs) hold significant promise in improving physics education research that uses machine learning. In this study, we compare the application of various models to perform large-scale analysis of written text grounded in a physics education research classification problem: identifying skills in students' typed lab notes through sentence-level labeling. Specifically, we use training data to fine-tune two different LLMs, BERT and LLaMA, and compare the performance of these models to both a traditional bag of words approach and a few-shot LLM (without fine-tuning).} We evaluate the models based on their resource use, performance metrics, and research outcomes when identifying skills in lab notes. We find that higher-resource models often, but not necessarily, perform better than lower-resource models. We also find that all models estimate similar trends in research outcomes, although the absolute values of the estimated measurements are not always within uncertainties of each other. We use the results to discuss relevant considerations for education researchers seeking to select a model type to use as a classifier.
Recent grants
Collaborative Research: Student Thinking About Measurements Across the Physics Curriculum
NSF · $160k · 2018–2023
NSF · $292k · 2020–2023
Studying Equity in Undergraduate Physics Labs
NSF · $335k · 2019–2024
Frequent coauthors
- 23 shared
Emily M. Smith
University of Nevada, Reno
- 23 shared
Emily M. Stump
Cornell University
- 21 shared
Meagan Sundstrom
Cornell University
- 18 shared
Cole Walsh
- 17 shared
Carl Wieman
Stanford University
- 14 shared
Gina Passante
California State University, Fullerton
- 11 shared
Ashley B. Heim
Cornell University
- 10 shared
Katherine N. Quinn
Princeton University
Education
- 2014
PhD, Physics and Astronomy
University of British Columbia
- 2011
M.Sc., Physics and Astronomy
University of British Columbia
- 2009
B.Sc.(Hons), Physics
University of Guelph
Awards & honors
- Endowed professorships (23 faculty)
- NSF-funded postdocs to research education across disciplines
- Provost’s seminar celebrates innovation in teaching
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Natasha Holmes
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup