Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
James Zou

James Zou

· Assistant Professor of Biomedical Data Science Faculty Director, AI for HealthVerified

Stanford University · Rheumatology

Active 2007–2026

h-index83
Citations40.1k
Papers645449 last 5y
Funding$10.1M1 active
See your match with James Zou — sign in to PhdFit.Sign in

About

James Zou is an Assistant Professor of Biomedical Data Science and, by courtesy, of Computer Science and of Electrical Engineering at Stanford University. He is affiliated with the Center for Artificial Intelligence in Medicine & Imaging (AIMI). His research focuses on artificial intelligence in healthcare, leveraging machine learning and data science to advance medical imaging and biomedical applications. Zou's work involves developing innovative algorithms and computational methods to improve diagnosis, treatment, and understanding of medical conditions, contributing to the integration of AI technologies into clinical practice.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Medicine
  • Internal medicine
  • Machine Learning
  • Cardiology
  • Data science
  • Computer Security
  • Political Science
  • Biology
  • Genetics
  • Engineering
  • Psychology
  • Algorithm
  • Software engineering
  • Computational biology
  • Ophthalmology
  • Cell biology
  • Database
  • Evolutionary biology
  • Law
  • Risk analysis (engineering)
  • Pharmacology
  • Radiology

Selected publications

  • Evaluation-driven Scaling for Scientific Discovery

    ArXiv.org · 2026-04-21

    articleOpen access

    Language models are increasingly used in scientific discovery to generate hypotheses, propose candidate solutions, implement systems, and iteratively refine them. At the core of these trial-and-error loops lies evaluation: the process of obtaining feedback on candidate solutions via verifiers, simulators, or task-specific scoring functions. While prior work has highlighted the importance of evaluation, it has not explicitly formulated the problem of how evaluation-driven discovery loops can be scaled up in a principled and effective manner to push the boundaries of scientific discovery, a problem this paper seeks to address. We introduce Simple Test-time Evaluation-driven Scaling (SimpleTES), a general framework that strategically combines parallel exploration, feedback-driven refinement, and local selection, revealing substantial gains unlocked by scaling evaluation-driven discovery loops along the right dimensions. Across 21 scientific problems spanning six domains, SimpleTES discovers state-of-the-art solutions using gpt-oss models, consistently outperforming both frontier-model baselines and sophisticated optimization pipelines. Particularly, we sped up the widely used LASSO algorithm by over 2x, designed quantum circuit routing policies that reduce gate overhead by 24.5%, and discovered new Erdos minimum overlap constructions that surpass the best-known results. Beyond novel discoveries, SimpleTES produces trajectory-level histories that naturally supervise feedback-driven learning. When post-trained on successful trajectories, models not only improve efficiency on seen problems but also generalize to unseen problems, discovering solutions that base models fail to uncover. Together, our results establish effective evaluation-driven loop scaling as a central axis for advancing LLM-driven scientific discovery, and provide a simple yet practical framework for realizing these gains.

  • Artificial intelligence agents in cancer research and oncology

    Nature reviews. Cancer · 2026-01-12 · 4 citations

    article
  • Reasoning or Knowledge: Stratified Evaluation of Biomedical LLMs

    Underline Science Inc. · 2026-03-06

    otherOpen accessSenior author

    Medical reasoning in large language models seeks to replicate clinicians' cognitive processes in interpreting patient data and making diagnostic decisions. However, widely used benchmarks—such as MedQA, MedMCQA, and PubMedQA—mix questions that require multi-step reasoning with those answerable through factual recall, complicating evaluation. We demonstrate this by training a PubMedBERT-based classifier on expert-curated labels and applying it to 11 widely used biomedical QA benchmarks, where we find that only 32.8% of the questions require multi-step reasoning, indicating that current evaluations largely measure recall. This stratified evaluation of biomedical models (HuatuoGPT-o1, MedReason, m1) and general-domain models (DeepSeek-R1, o4-mini, Qwen3) reveals consistently lower performance on reasoning versus knowledge (e.g., HuatuoGPT-o1: 56.9% vs. 44.8%). Beyond accuracy, we assess robustness through adversarial evaluations in which models are prefixed with uncertainty-inducing statements; biomedical reasoning models degrade sharply in this setting (e.g., MedReason: 50.4% → 24.4%), with declines especially pronounced on reasoning-heavy questions. Finally, we show that fine-tuning on high-quality reasoning examples augmented with adversarial traces, followed by reinforcement learning with GRPO, improves both robustness and accuracy across knowledge and reasoning subsets.

  • The Virtual Biotech: A Multi-Agent AI Framework for Therapeutic Discovery and Development

    bioRxiv (Cold Spring Harbor Laboratory) · 2026-02-23 · 1 citations

    articleOpen accessSenior authorCorresponding

    Abstract Drug discovery and development requires integrating diverse evidence across biological scales and data modalities. However, relevant data, tools, and expertise remain fragmented across teams and organizations, making integration difficult. To address these challenges, we introduce the Virtual Biotech, a coordinated team of AI agents that mirrors the structure of human therapeutic research organizations to support end-to-end computational discovery. The Virtual Biotech is led by a Chief Scientific Officer agent that receives scientific queries, delegates them to domain-specialized scientist agents, and integrates their outputs through data-driven reasoning. Scientist agents leverage complementary tools and knowledge sources spanning statistical genetics, functional genomics, pathways and interactions, chemoinformatics, disease biology, and clinical data. We showcase the Virtual Biotech across three translational applications. First, the agents autonomously annotated and analyzed outcomes from 55,984 clinical trials to identify genomic features of drug targets associated with trial success. More than 37,000 clinical-trialist agents curated structured trial outcomes and linked targets to multi-omic annotations, including cell-type-specific features derived by the agents from single-cell RNA-sequencing atlases. The agents discovered that drugs targeting cell-type-specific genes were 40% more likely to progress from Phase I to Phase II and 48% more likely to reach market (Phase IV), while exhibiting 32% lower adverse event rates. Second, the Virtual Biotech evaluated B7-H3 as a lung cancer target, integrating statistical genetics, single-cell, spatial, and clinicogenomic evidence to propose an antibody–drug conjugate strategy while identifying key liabilities and differentiation opportunities. Third, the platform analyzed a terminated ulcerative colitis trial targeting OSMR β to infer potential failure mechanisms and proposed biomarker-guided enrollment strategies to address precision-medicine gaps. Together, these results illustrate how the Virtual Biotech can enable more transparent, efficient, and comprehensive multi-scale therapeutic analyses, helping to accelerate early-stage drug discovery workflows while keeping human scientists in the loop.

  • Science Across Languages: Assessing LLM Multilingual Translation of Scientific Papers

    Underline Science Inc. · 2026-03-06

    otherOpen accessSenior author

    Scientific research is inherently global. However, the vast majority of academic journals are published exclusively in English, creating barriers for non-native-English-speaking researchers. In this study, we leverage large language models (LLMs) to translate published scientific articles while preserving their native JATS XML formatting, thereby developing a practical, automated approach for implementation by academic journals. Using our approach, we translate articles across multiple scientific disciplines into 28 languages. To evaluate translation accuracy, we introduce a novel question-and-answer (QA) benchmarking method and show an average performance of 95.9%, indicating that the key scientific details are accurately conveyed. In a user study, we translate the scientific papers of 15 researchers into their native languages. Interestingly, a third of the authors found many technical terms “overtranslated,” expressing a preference to keep terminology more familiar in English untranslated. Finally, we demonstrate how in-context learning techniques can be used to align translations with domain-specific preferences such as mitigating overtranslation, highlighting the adaptability and utility of LLM-driven scientific translation.

  • Leveraging multi-modal foundation models for analysing spatial multi-omic and histopathology data

    Nature Biomedical Engineering · 2026-02-05

    article
  • Reimagining human-centric drug development with new approach methodologies

    Science · 2026-04-16

    articleOpen access

    Despite unprecedented technological progress, most drug candidates continue to fail in clinical trials, reflecting a persistent gap between preclinical models and human biology. New approach methodologies (NAMs), by spanning human-derived cellular systems, microphysiological platforms, and artificial intelligence, offer a paradigm shift in human-centric drug development and biomedical research. Recent regulatory reforms, such as the US Food and Drug Administration (FDA) Modernization Act 3.0, have begun to position NAMs as a complement to or replacement for animal testing. This Review synthesizes emerging biological and computational NAMs and examines how their integration is reshaping drug development. We also discuss regulatory and ethical frameworks enabling this transition and outline a roadmap for embedding human-based science in a predictive, efficient, and ethically grounded infrastructure of human-centered drug development.

  • AI Agents for Data Science: A Discussion of “LAMBDA: A Large Model Based Data Agent”

    Journal of the American Statistical Association · 2026-01-02

    article1st authorCorresponding
  • SyntheMol-RL: a flexible reinforcement learning framework for designing easily synthesizable antibiotics

    Molecular Systems Biology · 2026-04-23

    articleOpen access

    The rise of antibiotic-resistant pathogens such as Staphylococcus aureus has created an urgent need for new antibiotics. Generative artificial intelligence (AI) has shown promise in drug discovery, but existing models often fail to propose compounds that are both effective and synthetically tractable. To address these challenges, we introduce SyntheMol-RL, a reinforcement learning-based generative model that can rapidly design synthetically accessible small-molecule drug candidates from a massive chemical space of 46 billion compounds. SyntheMol-RL improves upon our prior Monte Carlo tree search (MCTS)-based SyntheMol model by generalizing across chemically similar building blocks and enabling multi-parameter optimization. We applied SyntheMol-RL to generate candidate antibiotics against S. aureus by optimizing for both antibacterial activity and aqueous solubility, and we found that SyntheMol-RL generated molecules with improved predicted properties compared to both the previous MCTS version of SyntheMol as well as an AI-based virtual screening baseline. We synthesized 79 SyntheMol-RL compounds that were unique relative to the training dataset and found that 13 showed potent in vitro activity, of which seven passed our structural novelty filters that compared them to known antibiotics. Furthermore, one hit compound, synthecin, demonstrated efficacy in a murine wound infection model of methicillin-resistant S. aureus (MRSA). These results validate SyntheMol-RL's ability to generate synthetically accessible candidate antibiotics and position SyntheMol-RL as a powerful tool for drug design across therapeutic domains.

  • Improving LLM Group Fairness on Tabular Data via In-Context Learning

    Proceedings of the AAAI/ACM Conference on AI Ethics and Society · 2025-10-15 · 1 citations

    articleOpen accessSenior author

    Large language models (LLMs) have been shown to be effective on tabular prediction tasks in the low-data regime, leveraging their internal knowledge and ability to learn from instructions and examples. However, LLMs can fail to generate predictions that satisfy group fairness, that is, produce equitable outcomes across groups. Critically, conventional debiasing approaches for natural language tasks do not directly translate to mitigating group unfairness in tabular settings. In this work, we systematically investigate four empirical approaches to improve group fairness of LLM predictions on tabular datasets, including fair prompt optimization, soft prompt tuning, strategic selection of few-shot examples, and self-refining predictions via chain-of-thought reasoning. Through experiments on four tabular datasets using both open-source and proprietary LLMs, we show the effectiveness of these methods in enhancing demographic parity while maintaining high overall performance. Our analysis provides actionable insights for practitioners in selecting the most suitable approach based on their specific requirements and constraints.

Recent grants

Frequent coauthors

  • Zhenqin Wu

    Enable Biosciences (United States)

    81 shared
  • Eric Q. Wu

    79 shared
  • Alexandro E. Trevino

    59 shared
  • Aaron T. Mayer

    Enable Biosciences (United States)

    59 shared
  • Martin Jinye Zhang

    Harvard University

    58 shared
  • David Ouyang

    Cedars-Sinai Smidt Heart Institute

    58 shared
  • B Bernstein

    Broad Institute

    51 shared
  • Bryan He

    46 shared

Education

  • Ph.D., Biomedical Data Science

    Stanford University

    2018
  • M.S., Computer Science

    Stanford University

    2013
  • B.S., Electrical Engineering and Computer Science

    University of California, Berkeley

    2011
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with James Zou

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup