
Melody Huang
· Assistant Professor of Political ScienceVerifiedYale University · Department of Political Science
Active 2018–2026
About
Melody Y. Huang is an Assistant Professor of Political Science and Statistics & Data Science at Yale University. Her research broadly focuses on developing robust statistical methods to credibly estimate causal effects under real-world complications. Prior to her current position, she was a Postdoctoral Fellow at Harvard University, where she worked with Kosuke Imai. She earned her Ph.D. in Statistics from the University of California, Berkeley, under the advisement of Erin Hartman. Her academic career includes teaching roles at Yale University, University of California, Berkeley, and University of California, Los Angeles, covering courses in statistical inference, causal inference, experimental design, quantitative methodology, linear models, and fundamentals of big data.
Research topics
- Computer Science
- Econometrics
- Mathematics
- Statistics
- Artificial Intelligence
- Medicine
- Environmental health
- Economics
- Engineering
Selected publications
Journal of the Royal Statistical Society Series A (Statistics in Society) · 2026-01-10
article1st authorCorrespondingAbstract In a clustered observational study, treatment is assigned to groups and all units within the group are exposed to the treatment. Here, we use a clustered observational study (COS) design to estimate the effectiveness of Magnet Nursing certificates for emergency surgery patients. Recent research has introduced specialized weighting estimators for the COS design that balance baseline covariates at the unit and cluster level. These methods allow researchers to adjust for observed confounders but are sensitive to unobserved confounding. In this paper, we develop new sensitivity analysis methods tailored to weighting estimators for COS designs. We provide several key contributions. First, we introduce a key bias decomposition, tailored to the specific confounding structure that arises in a COS. Second, we develop a sensitivity framework for weighted COS designs that constrain the error in the underlying weights. We introduce both a marginal sensitivity model and a variance-based sensitivity model, and extend both to accommodate multiple estimands. Finally, we propose amplification and benchmarking methods to better interpret the results. Throughout, we illustrate our proposed methods by analysing the effectiveness of Magnet nursing hospitals.
Distilling heterogeneous treatment effects: Stable subgroup estimation in causal inference
ArXiv.org · 2025-02-11
preprintOpen access1st authorCorrespondingRecent methodological developments have introduced new black-box approaches to better estimate heterogeneous treatment effects; however, these methods fall short of providing interpretable characterizations of the underlying individuals who may be most at risk or benefit most from receiving the treatment, thereby limiting their practical utility. In this work, we introduce \textit{causal distillation trees} (CDT) to estimate interpretable subgroups. CDT allows researchers to fit any machine learning model to estimate the heterogeneous treatment effect, and then leverages a simple, second-stage tree-based model to "distill" the estimated treatment effect into meaningful subgroups. As a result, CDT inherits the improvements in predictive performance from black-box machine learning models while preserving the interpretability of a simple decision tree. We derive theoretical guarantees for the consistency of the estimated subgroups using CDT, and introduce stability-driven diagnostics for researchers to evaluate the quality of the estimated subgroups. We illustrate our proposed method on a randomized controlled trial of antiretroviral treatment for HIV from the AIDS Clinical Trials Group Study 175 and show that CDT out-performs state-of-the-art approaches in constructing stable, clinically relevant subgroups.
ArXiv.org · 2025-04-30
preprintOpen access1st authorCorrespondingIn a clustered observational study, treatment is assigned to groups and all units within the group are exposed to the treatment. Here, we use a clustered observational study (COS) design to estimate the effectiveness of Magnet Nursing certificates for emergency surgery patients. Recent research has introduced specialized weighting estimators for the COS design that balance baseline covariates at the unit and cluster level. These methods allow researchers to adjust for observed confounders, but are sensitive to unobserved confounding. In this paper, we develop new sensitivity analysis methods tailored to weighting estimators for COS designs. We provide several key contributions. First, we introduce a key bias decomposition, tailored to the specific confounding structure that arises in a COS. Second, we develop a sensitivity framework for weighted COS designs that constrain the error in the underlying weights. We introduce both a marginal sensitivity model and a variance-based sensitivity model, and extend both to accommodate multiple estimands. Finally, we propose amplification and benchmarking methods to better interpret the results. Throughout, we illustrate our proposed methods by analyzing the effectiveness of Magnet nursing hospitals.
Proceedings of the National Academy of Sciences · 2025-09-17 · 2 citations
articleOpen accessThe use of AI, or more generally data-driven algorithms, has become ubiquitous in today's society. Yet, in many cases and especially when stakes are high, humans still make final decisions. The critical question, therefore, is whether AI helps humans make better decisions compared to a human-alone or AI-alone system. We introduce a methodological framework to answer this question empirically with minimal assumptions. We measure a decision maker's ability to make correct decisions using standard classification metrics based on the baseline potential outcome. We consider a single-blinded and unconfounded treatment assignment, in which the provision of AI-generated recommendations is assumed to be randomized across cases, conditional on observed covariates, with final decisions made by humans. Under this study design, we show how to compare the performance of three alternative decision-making systems-human-alone, human-with-AI, and AI-alone. Importantly, the AI-alone system encompasses any individualized treatment assignment, including those not used in the original study. We also show when AI recommendations should be provided to a human-decision maker, and when one should follow such recommendations. We apply the proposed methodology to our own randomized controlled trial evaluating a pretrial risk assessment instrument. We find that the risk assessment recommendations do not improve the classification accuracy of a judge's decision to impose cash bail. Furthermore, replacing a human judge with algorithms-the risk assessment score and a large language model in particular-yields worse classification performance.
senseweight: Sensitivity Analysis for Weighted Estimators
2025-08-22
datasetOpen access1st authorCorrespondingProvides tools to conduct interpretable sensitivity analyses for weighted estimators, introduced in Huang (2024) <<a href="https://doi.org/10.1093%2Fjrsssa%2Fqnae012" target="_top">doi:10.1093/jrsssa/qnae012</a>> and Hartman and Huang (2024) <<a href="https://doi.org/10.1017%2Fpan.2023.12" target="_top">doi:10.1017/pan.2023.12</a>>. The package allows researchers to generate the set of recommended sensitivity summaries to evaluate the sensitivity in their underlying weighting estimators to omitted moderators or confounders. The tools can be flexibly applied in causal inference settings (i.e., in external and internal validity contexts) or survey contexts.
Relative Bias Under Imperfect Identification in Observational Causal Inference
ArXiv.org · 2025-07-31
preprintOpen access1st authorCorrespondingTo conduct causal inference in observational settings, researchers must rely on certain identifying assumptions. In practice, these assumptions are unlikely to hold exactly. This paper considers the bias of selection-on-observables, instrumental variables, and proximal inference estimates under violations of their identifying assumptions. We develop bias expressions for IV and proximal inference that show how violations of their respective assumptions are amplified by any unmeasured confounding in the outcome variable. We propose a set of sensitivity tools that quantify the sensitivity of different identification strategies, and an augmented bias contour plot visualizes the relationship between these strategies. We argue that the act of choosing an identification strategy implicitly expresses a belief about the degree of violations that must be present in alternative identification strategies. Even when researchers intend to conduct an IV or proximal analysis, a sensitivity analysis comparing different identification strategies can help to better understand the implications of each set of assumptions. Throughout, we compare the different approaches on a re-analysis of the impact of state surveillance on the incidence of protest in Communist Poland.
Design sensitivity and its implications for weighted observational studies
Journal of the Royal Statistical Society Series A (Statistics in Society) · 2025-07-15 · 2 citations
article1st authorCorrespondingAbstract Careful design and preregistration of a treated-control comparison in an observational study enhances the quality of its evidence. However, sensitivity to unmeasured confounding is not typically a primary consideration in the preanalysis design stage. In the following paper, we introduce a framework for weighted estimators that allows researchers to optimize for robustness to omitted variable bias at the design stage using a measure called design sensitivity. Inspired by a similar measure for matching estimators, design sensitivity describes the asymptotic power of a sensitivity analysis, and allows researchers to transparently evaluate the impact of different estimation strategies on sensitivity to omitted confounders prior to outcome analysis. We apply this general framework to two commonly used sensitivity models, the marginal sensitivity model and the variance-based sensitivity model. By comparing design sensitivities, we interrogate how key features of weighted observational designs, including trimming weights, choosing between different treatment versions, and altering the study’s inclusion criteria, impact robustness to unmeasured confounding. We illustrate the proposed framework on a study examining drivers of support for the 2016 Colombian peace agreement.
Overlap violations in external validity: Application to Ugandan cash transfer programs
The Annals of Applied Statistics · 2025-03-01 · 1 citations
article1st authorCorrespondingEstimating externally valid causal effects is a foundational problem in the social and biomedical sciences. Generalizing or transporting causal estimates from an experimental sample to a target population of interest relies on an overlap (or positivity) assumption between the experimental sample and the target population. In practice, having full overlap between an experimental sample and a target population can be implausible. In the following paper, we introduce a framework for considering external validity in the presence of overlap violations. We propose a novel bias decomposition that parameterizes the bias from an overlap violation into two components: (1) the proportion of units omitted and (2) the degree to which omitting the units moderates the treatment effect. The bias decomposition offers an intuitive and straightforward approach to conducting sensitivity analysis to assess robustness to overlap violations. Furthermore, we introduce a suite of sensitivity tools in the form of summary measures and benchmarking, which help researchers consider the plausibility of the overlap violations. We illustrate the proposed framework on an experiment evaluating the impact of a cash transfer program in Northern Uganda.
Generalizing causal effects with noncompliance: Application to deep canvassing experiments
ArXiv.org · 2025-05-30
preprintOpen accessSenior authorStandard approaches in generalizability often focus on generalizing the intent-to-treat (ITT). However, in practice, a more policy-relevant quantity is the generalized impact of an intervention across compliers. While instrumental variable (IV) methods are commonly used to estimate the complier average causal effect (CACE) within samples, standard approaches cannot be applied to a target population with a different distribution from the study sample. This paper makes several key contributions. First, we introduce a new set of identifying assumptions in the form of a population-level exclusion restriction that allows for identification of the target complier average causal effect (T-CACE) in both randomized experiments and observational studies. This allows researchers to identify the T-CACE without relying on standard principal ignorability assumptions. Second, we propose a class of inverse-weighted estimators for the T-CACE and derive their asymptotic properties. We provide extensions for settings in which researchers have access to auxiliary compliance information across the target population. Finally, we introduce a sensitivity analysis for researchers to evaluate the robustness of the estimators in the presence of unmeasured confounding. We illustrate our proposed method through extensive simulations and a study evaluating the impact of deep canvassing on reducing exclusionary attitudes.
causalDT: Causal Distillation Trees
2025-09-03
datasetOpen accessCausal Distillation Tree (CDT) is a novel machine learning method for estimating interpretable subgroups with heterogeneous treatment effects. CDT allows researchers to fit any machine learning model (or metalearner) to estimate heterogeneous treatment effects for each individual, and then "distills" these predicted heterogeneous treatment effects into interpretable subgroups by fitting an ordinary decision tree to predict the previously-estimated heterogeneous treatment effects. This package provides tools to estimate causal distillation trees (CDT), as detailed in Huang, Tang, and Kenney (2025) <<a href="https://doi.org/10.48550%2FarXiv.2502.07275" target="_top">doi:10.48550/arXiv.2502.07275</a>>.
Frequent coauthors
- 4 shared
Randall R. Rojas
University of California, Los Angeles
- 4 shared
Patrick D. Convery
University of California, Los Angeles
- 3 shared
Samuel D. Pimentel
University of California, Berkeley
- 3 shared
Erin Hartman
University of California, Berkeley
- 3 shared
Harsh Parikh
Johns Hopkins University
- 2 shared
Claude Messan Setodji
- 2 shared
Lane F. Burgette
RAND Corporation
- 2 shared
Brian Vegetabile
RAND Corporation
Labs
Melody Y. Huang LabPI
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Melody Huang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup