Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Kosuke Imai

Kosuke Imai

· Professor of Government and StatisticsVerified

Harvard University · Biostatistics

Active 2000–2026

h-index67
Citations34.3k
Papers278137 last 5y
Funding$665k
See your match with Kosuke Imai — sign in to PhdFit.Sign in

About

Professor Kosuke Imai advises a large number of graduate students across various subfields and disciplines, including statisticians, methodologists, and those focused on substantive research areas. He participates actively in dissertation committees and regularly engages with students through weekly research group meetings where he provides guidance on dissertation research. Professor Imai collaborates with graduate students on research projects, emphasizing the importance of strong training in statistics and computational skills, as well as independent thinking and initiative. He writes recommendation letters for both graduate and undergraduate students, prioritizing efficiency and requiring detailed application materials and a waiver of the right to access the letter. Professor Imai is committed to supporting students once they are admitted to PhD programs and encourages early communication regarding dissertation committee participation and research collaboration.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Mathematics
  • Sociology
  • Statistics
  • Political Science
  • Data Mining
  • Machine Learning
  • Econometrics
  • Demography
  • Geography
  • Law
  • Economics
  • Gender studies
  • Algorithm
  • Cartography
  • Engineering
  • Biology
  • Anthropology
  • Mathematical optimization
  • Actuarial science
  • Political economy

Selected publications

  • Using Embedding Models to Improve Probabilistic Race Prediction

    arXiv (Cornell University) · 2026-04-24

    preprintOpen accessSenior author

    Estimating racial disparity requires individual-level race data, which are often unavailable due to the sensitivity of collecting such information. To address this problem, many researchers utilize Bayesian Improved Surname Geocoding (BISG), which have critically relied on Census surname data. Unfortunately, these data capture race-surname relationships only for common surnames, omitting approximately 10% of the US population. We show that predictive performance degrades substantially for individuals with such omitted, uncommon surnames because standard BISG implementation relies on a uninformative generic prior in these cases. To address this limitation, we propose embedding-powered BISG (eBISG), which uses pre-trained text embeddings to represent names as dense vectors and trains neural networks on 2020 Census surname and first-name data to estimate race probabilities for names not covered in the Census. We compare five approaches: standard BISG using only surnames, BIFSG incorporating first name probabilities, surname embedding for unlisted names, surname and first name embedding combining both, and a full-name embedding trained on voter file data from Southern states that captures interactions between name components. We show that each successive eBISG approach improves race prediction, with the full-name embedding yielding the largest gains, particularly for Hispanic and Asian voters whose surnames are absent from the Census list.

  • An Axiomatic Foundation for Decisions with Counterfactual Utility

    ArXiv.org · 2026-05-06

    articleOpen access

    Counterfactual utilities evaluate decisions not only by the realized outcome under a given decision, but also by the counterfactual outcomes that would arise under alternative decisions. By generalizing standard utility frameworks, they allow decision-makers to encode asymmetric criteria, such as avoiding harm and anticipating regret. Recent work, however, has raised fundamental concerns about the coherence and transitivity of counterfactual utilities. We address these concerns by extending the von Neumann-Morgenstern (vNM) framework to preferences defined on the extended space of all potential outcomes rather than realized outcomes alone. We show that expected counterfactual utility satisfies the vNM axioms on this extended domain, thereby admitting a coherent preference representation. We further examine how counterfactual preferences map onto the realized outcome space through menu-dependent and context-dependent projections. This axiomatic framework reconciles apparent inconsistencies highlighted by the Russian roulette example in the statistics literature and resolves the well-known Allais paradox from behavioral economics. We also derive an additional axiom required to reduce counterfactual utilities to standard utilities on the same potential outcome space, and establish an axiomatic foundation for additive counterfactual utilities, which satisfy a necessary and sufficient condition for point identification. Finally, we show that our results hold regardless of whether individual potential outcomes are deterministic or stochastic.

  • Improving Minority Population Sampling with BISG Probabilities: Evidence from a Survey of Jewish Americans

    arXiv (Cornell University) · 2026-05-06

    preprintOpen access

    Sampling geographically dispersed minority populations poses substantial challenges when individual group membership cannot be directly observed. Although stratified sampling can offer efficiency gains, these gains are typically modest unless the minority population is highly concentrated within a small number of strata. In this paper, we propose using Bayesian Improved Surname Geocoding (BISG) to enhance the efficiency of minority population sampling. BISG generates individual-level probabilities of minority group membership based on names and residential addresses. We incorporate these probabilities into a stratified Poisson probability sampling design. Applying the proposed approach to a national survey of Jewish Americans, we find that our estimates closely align with those from a large-scale Pew Research Center survey of the same population, which relied on a substantially more expensive sampling strategy involving geographic stratification and screening. At a fraction of the cost, our survey reproduces nearly identical patterns observed by Pew, including estimates of religious denominations and participation in specific religious activities.

  • Using Embedding Models to Improve Probabilistic Race Prediction

    ArXiv.org · 2026-04-24

    articleOpen accessSenior author

    Estimating racial disparity requires individual-level race data, which are often unavailable due to the sensitivity of collecting such information. To address this problem, many researchers utilize Bayesian Improved Surname Geocoding (BISG), which have critically relied on Census surname data. Unfortunately, these data capture race-surname relationships only for common surnames, omitting approximately 10% of the US population. We show that predictive performance degrades substantially for individuals with such omitted, uncommon surnames because standard BISG implementation relies on a uninformative generic prior in these cases. To address this limitation, we propose embedding-powered BISG (eBISG), which uses pre-trained text embeddings to represent names as dense vectors and trains neural networks on 2020 Census surname and first-name data to estimate race probabilities for names not covered in the Census. We compare five approaches: standard BISG using only surnames, BIFSG incorporating first name probabilities, surname embedding for unlisted names, surname and first name embedding combining both, and a full-name embedding trained on voter file data from Southern states that captures interactions between name components. We show that each successive eBISG approach improves race prediction, with the full-name embedding yielding the largest gains, particularly for Hispanic and Asian voters whose surnames are absent from the Census list.

  • An Axiomatic Foundation for Decisions with Counterfactual Utility

    arXiv (Cornell University) · 2026-05-06

    preprintOpen access

    Counterfactual utilities evaluate decisions not only by the realized outcome under a given decision, but also by the counterfactual outcomes that would arise under alternative decisions. By generalizing standard utility frameworks, they allow decision-makers to encode asymmetric criteria, such as avoiding harm and anticipating regret. Recent work, however, has raised fundamental concerns about the coherence and transitivity of counterfactual utilities. We address these concerns by extending the von Neumann-Morgenstern (vNM) framework to preferences defined on the extended space of all potential outcomes rather than realized outcomes alone. We show that expected counterfactual utility satisfies the vNM axioms on this extended domain, thereby admitting a coherent preference representation. We further examine how counterfactual preferences map onto the realized outcome space through menu-dependent and context-dependent projections. This axiomatic framework reconciles apparent inconsistencies highlighted by the Russian roulette example in the statistics literature and resolves the well-known Allais paradox from behavioral economics. We also derive an additional axiom required to reduce counterfactual utilities to standard utilities on the same potential outcome space, and establish an axiomatic foundation for additive counterfactual utilities, which satisfy a necessary and sufficient condition for point identification. Finally, we show that our results hold regardless of whether individual potential outcomes are deterministic or stochastic.

  • Improving Minority Population Sampling with BISG Probabilities: Evidence from a Survey of Jewish Americans

    ArXiv.org · 2026-05-06

    articleOpen access

    Sampling geographically dispersed minority populations poses substantial challenges when individual group membership cannot be directly observed. Although stratified sampling can offer efficiency gains, these gains are typically modest unless the minority population is highly concentrated within a small number of strata. In this paper, we propose using Bayesian Improved Surname Geocoding (BISG) to enhance the efficiency of minority population sampling. BISG generates individual-level probabilities of minority group membership based on names and residential addresses. We incorporate these probabilities into a stratified Poisson probability sampling design. Applying the proposed approach to a national survey of Jewish Americans, we find that our estimates closely align with those from a large-scale Pew Research Center survey of the same population, which relied on a substantially more expensive sampling strategy involving geographic stratification and screening. At a fraction of the cost, our survey reproduces nearly identical patterns observed by Pew, including estimates of religious denominations and participation in specific religious activities.

  • A Statistical Model of Bipartite Networks: Application to Cosponsorship in the United States Senate

    Political Analysis · 2025-09-17 · 1 citations

    articleOpen accessSenior author

    Abstract Many networks in political and social research are bipartite, connecting two distinct node types. A common example is cosponsorship networks, where legislators are linked through the bills they support. However, most bipartite network analyses in political science rely on statistical models fitted to a “projected” unipartite network. This approach can lead to aggregation bias and an artificially high degree of clustering, invalidating the study of group roles in network formation. To address these issues, we develop a statistical model of bipartite networks theorized to arise from group interactions, extending the mixed-membership stochastic blockmodel. Our model identifies groups within each node type that exhibit common edge formation patterns and incorporates node and dyad-level covariates as predictors of group membership and observed dyadic relations. We derive an efficient computational algorithm to fit the model and apply it to cosponsorship data from the United States Senate. We show that senators who were perfectly split along party lines remained productive and pass major legislation by forming non-partisan, power-brokering coalitions that found common ground through low-stakes bills. We also find evidence of reciprocity norms and policy expertise impacting cosponsorships. An open-source software package is available for researchers to replicate these insights.

  • Spatiotemporal causal inference with arbitrary spillover and carryover effects: Airstrikes and insurgent violence in the Iraq War

    arXiv (Cornell University) · 2025-04-04

    preprintOpen access

    Social scientists now routinely draw on high-frequency, high-granularity ''microlevel'' data to estimate the causal effects of subnational interventions. To date, most researchers aggregate these data into panels, often tied to large-scale administrative units. This approach has two limitations. First, data (over)aggregation obscures valuable spatial and temporal information, heightening the risk of mistaken inferences. Second, existing panel approaches either ignore spatial spillover and temporal carryover effects completely or impose overly restrictive assumptions. We introduce a general methodological framework and an accompanying open-source R package, geocausal, that enable spatiotemporal causal inference with arbitrary spillover and carryover effects. Using this framework, we demonstrate how to define and estimate causal quantities of interest, explore heterogeneous treatment effects, conduct causal mediation analysis, and perform data visualization. We apply our methodology to the Iraq War (2003-11), where we reexamine long-standing questions about the effects of airstrikes on insurgent violence.

  • Does AI help humans make better decisions? A statistical evaluation framework for experimental and observational studies

    Proceedings of the National Academy of Sciences · 2025-09-17 · 2 citations

    articleOpen accessCorresponding

    The use of AI, or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a methodological framework to answer this question empirically with minimal assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, in which the provision of AI-generated recommendations is assumed to be randomized across cases, conditional on observed covariates, with final decisions made by humans. Under this study design, we show how to compare the performance of three alternative decision-making systems-human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system encompasses any individualized treatment assignment, including those not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, replacing a human judge with algorithms-the risk assessment score and a large language model in particular-yields worse classification performance.

  • Estimating the Partisan Bias of Japanese Legislative Redistricting Plans Using a Simulation Algorithm

    2025-07-11

    preprintOpen accessSenior author

    Although partisan gerrymandering has been found to be widespread for congressional redistricting in the United States, there exists little empirical research on legislative redistricting in other countries. We investigate the partisan bias of Japanese redistricting. Some scholars have argued that the prominent role played by the nonpartisan commission leaves little room for partisan gerrymandering. Others have pointed out, however, that the Japanese redistricting process may be subject to political influence because the members of the redistricting commission must be appointed by the Prime Minister and approved by the Diet. In addition, the commission invites the governor of each prefecture to provide their opinions on redistricting. We conduct a systematic empirical analysis to estimate the partisan bias of the 2022 Japanese Lower House redistricting plans. We apply a state-of-the-art redistricting simulation algorithm to generate a large number of alternative nonpartisan redistricting plans. The sampled plans are representative of the population of plans and are consistent with the redistricting rules with which the commission must comply. By comparing the enacted plan with this nonpartisan baseline, we quantify the extent to which the enacted plan favors a particular party. Unlike the traditional methods, our simulation approach accounts for political geography and redistricting rules specific to each prefecture. Our analysis shows that the Japanese redistricting process yields little partisan bias at both the prefecture and district levels.

Recent grants

Frequent coauthors

  • Brandon De La Cuesta

    Stanford University

    1065 shared
  • Naoki Egami

    Columbia University

    1064 shared
  • Christopher T Kenny

    206 shared
  • Benjamin Fifield

    Quantitative BioSciences

    205 shared
  • Jun Kawahara

    187 shared
  • In Song Kim

    131 shared
  • Steven Liao

    Providence College

    111 shared
  • Gary King

    Harvard University Press

    95 shared

Labs

Education

  • Ph.D., Statistics

    Harvard University

    2006
  • M.A., Statistics

    Harvard University

    2003
  • B.A., Mathematics

    University of Tokyo

    1999
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Kosuke Imai

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup