Michael Pak-shing Leung
· Associate ProfessorUniversity of California, Santa Cruz · Economics
Active 2002–2025
About
Michael P Leung is an Associate Professor in the Economics Department within the Social Sciences Division. His academic role involves teaching and research in economics, contributing to the department's mission within the broader social sciences community at UCSC. Further details about his specific research focus, background, or key contributions are not provided on the page.
Research topics
- Computer Science
- Statistics
- Data Mining
- Mathematics
- Applied mathematics
- Mathematical optimization
- Algorithm
Selected publications
Cluster-Randomized Trials with Cross-Cluster Interference
Journal of the American Statistical Association · 2025-10-10
articleOpen access1st authorCorrespondingThe literature on cluster-randomized trials typically allows for interference within but not across clusters. This may be implausible when units are irregularly distributed across space without well-separated communities, as clusters in such cases may not align with significant geographic, social, or economic divisions. This paper develops methods for reducing bias due to cross-cluster interference. We first propose an estimation strategy that excludes units not surrounded by clusters assigned to the same treatment arm. We show that this substantially reduces bias relative to conventional difference-in-means estimators without significant cost to variance. Second, we formally establish a bias-variance trade-off in the choice of clusters: constructing fewer, larger clusters reduces bias due to interference but increases variance. We provide a rule for choosing the number of clusters to balance the asymptotic orders of the bias and variance of our estimator. Finally, we consider unsupervised learning for cluster construction and provide theoretical guarantees for $k$-medoids.
Neighborhood Stability in Double/Debiased Machine Learning with Dependent Data
ArXiv.org · 2025-11-14
preprintOpen accessSenior authorThis paper studies double/debiased machine learning (DML) methods applied to weakly dependent data. We allow observations to be situated in a general metric space that accommodates spatial and network data. Existing work implements cross-fitting by excluding from the training fold observations sufficiently close to the evaluation fold. We find in simulations that this can result in exceedingly small training fold sizes, particularly with network data. We therefore seek to establish the validity of DML without cross-fitting, building on recent work by Chen et al. (2022). They study i.i.d. data and require the machine learner to satisfy a natural stability condition requiring insensitivity to data perturbations that resample a single observation. We extend these results to dependent data by strengthening stability to "neighborhood stability," which requires insensitivity to resampling observations in any slowly growing neighborhood. We show that existing results on the stability of various machine learners can be adapted to verify neighborhood stability.
Neighborhood Stability in Double/Debiased Machine Learning with Dependent Data
SSRN Electronic Journal · 2025-01-01
preprintOpen accessSenior authorIdentifying Treatment and Spillover Effects Using Exposure Contrasts
arXiv (Cornell University) · 2024-03-13
preprintOpen access1st authorCorrespondingTo report spillover effects, a common practice is to regress outcomes on statistics summarizing neighbors' treatments. This paper studies nonparametric analogs of these estimands, which we refer to as exposure contrasts. We demonstrate that a contrast may have the opposite sign of the unit-level effects of interest even under unconfoundedness. We then provide interpretable conditions on interference and the assignment mechanism under which exposure contrasts can be represented as convex averages of the unit-level effects and therefore avoid sign reversals. These conditions encompass cluster-randomized trials, network experiments, and observational settings with peer effects in selection into treatment.
Biometrika · 2024-02-12 · 3 citations
article1st authorCorrespondingJournal Article Discussion of 'Causal inference with misspecified exposure mappings: separating definitions and assumptions' Get access Michael P Leung Michael P Leung Department of Economics, University of California, Santa Cruz, 1156 High Street, Santa Cruz, California 95064, U.S.A Email: leungm@ucsc.edu Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 111, Issue 1, March 2024, Pages 17–20, https://doi.org/10.1093/biomet/asad040 Published: 12 February 2024 Article history Received: 14 June 2023 Editorial decision: 16 June 2023 Published: 12 February 2024
Cluster-Randomized Trials with Cross-Cluster Interference
arXiv (Cornell University) · 2023-10-28
preprintOpen access1st authorCorrespondingThe literature on cluster-randomized trials typically allows for interference within but not across clusters. This may be implausible when units are irregularly distributed across space without well-separated communities, as clusters in such cases may not align with significant geographic, social, or economic divisions. This paper develops methods for reducing bias due to cross-cluster interference. We first propose an estimation strategy that excludes units not surrounded by clusters assigned to the same treatment arm. We show that this substantially reduces bias relative to conventional difference-in-means estimators without significant cost to variance. Second, we formally establish a bias-variance trade-off in the choice of clusters: constructing fewer, larger clusters reduces bias due to interference but increases variance. We provide a rule for choosing the number of clusters to balance the asymptotic orders of the bias and variance of our estimator. Finally, we consider unsupervised learning for cluster construction and provide theoretical guarantees for $k$-medoids.
Network Cluster‐Robust Inference
Econometrica · 2023-01-01 · 14 citations
article1st authorCorrespondingSince network data commonly consists of observations from a single large network, researchers often partition the network into clusters in order to apply cluster‐robust inference methods. Existing such methods require clusters to be asymptotically independent. Under mild conditions, we prove that, for this requirement to hold for network‐dependent data, it is necessary and sufficient that clusters have low conductance, the ratio of edge boundary size to volume. This yields a simple measure of cluster quality. We find in simulations that when clusters have low conductance, cluster‐robust methods control size better than HAC estimators. However, for important classes of networks lacking low‐conductance clusters, the former can exhibit substantial size distortion. To determine the number of low‐conductance clusters and construct them, we draw on results in spectral graph theory that connect conductance to the spectrum of the graph Laplacian. Based on these results, we propose to use the spectrum to determine the number of low‐conductance clusters and spectral clustering to construct them.
Rate-optimal cluster-randomized designs for spatial interference
The Annals of Statistics · 2022-10-01 · 16 citations
article1st authorCorrespondingWe consider a potential outcomes model in which interference may be present between any two units but the extent of interference diminishes with spatial distance. The causal estimand is the global average treatment effect, which compares outcomes under the counterfactuals that all or no units are treated. We study a class of designs in which space is partitioned into clusters that are randomized into treatment and control. For each design, we estimate the treatment effect using a Horvitz–Thompson estimator that compares the average outcomes of units with all or no neighbors treated, where the neighborhood radius is of the same order as the cluster size dictated by the design. We derive the estimator’s rate of convergence as a function of the design and degree of interference and use this to obtain estimator-design pairs that achieve near-optimal rates of convergence under relatively minimal assumptions on interference. We prove that the estimators are asymptotically normal and provide a variance estimator. For practical implementation of the designs, we suggest partitioning space using clustering algorithms.
Graph Neural Networks for Causal Inference Under Network Confounding
arXiv (Cornell University) · 2022-11-15 · 1 citations
preprintOpen access1st authorCorrespondingThis paper studies causal inference with observational data from a single large network. We consider a nonparametric model with interference in both potential outcomes and selection into treatment. Specifically, both stages may be the outcomes of simultaneous equations models, allowing for endogenous peer effects. This results in high-dimensional network confounding where the network and covariates of all units constitute sources of selection bias. In contrast, the existing literature assumes that confounding can be summarized by a known, low-dimensional function of these objects. We propose to use graph neural networks (GNNs) to adjust for network confounding. When interference decays with network distance, we argue that the model has low-dimensional structure that makes estimation feasible and justifies the use of shallow GNN architectures.
Dependence‐robust inference using resampled statistics
Journal of Applied Econometrics · 2021-07-28 · 3 citations
preprintOpen access1st authorCorrespondingSummary We develop inference procedures robust to general forms of weak dependence. The procedures utilize test statistics constructed by resampling in a manner that does not depend on the unknown correlation structure of the data. We prove that the statistics are asymptotically normal under the weak requirement that the target parameter can be consistently estimated at the parametric rate. This holds for regular estimators under many well‐known forms of weak dependence and justifies the claim of dependence robustness. We consider applications to settings with unknown or complicated forms of dependence, with various forms of network dependence as leading examples. We develop tests for both moment equalities and inequalities.
Recent grants
Statistical Theory for Economic Models of Network Formation
NSF · $112k · 2018–2020
Frequent coauthors
- 27 shared
Jessie Li
- 27 shared
Han Hong
- 25 shared
A. Ronald Gallant
Pennsylvania State University
- 7 shared
Hyungsik Roger Moon
Yonsei University
- 3 shared
Hossein Alidaee
- 3 shared
Eric Auerbach
- 1 shared
Manju Monga
UC San Diego Health System
- 1 shared
Karen Bishop
Australian National University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Michael Pak-shing Leung
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup