Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Annie Qu

Annie Qu

· Professor

University of California, Santa Barbara · Statistics and Applied Probability

Active 2000–2026

h-index36
Citations4.4k
Papers18861 last 5y
Funding
See your match with Annie Qu — sign in to PhdFit.Sign in

Selected publications

  • Multi-Task Learning for Heterogeneous Multi-Source Block-Wise Missing Data

    Figshare · 2026-01-01

    articleOpen accessSenior author

    Multi-task learning (MTL) has emerged as an imperative machine learning tool to solve multiple learning tasks simultaneously and has been successfully applied to healthcare, marketing, and biomedical fields. However, in order to borrow information across different tasks effectively, it is essential to utilize both homogeneous and heterogeneous information. Among the extensive literature on MTL, various forms of heterogeneity are presented in MTL problems, such as block-wise, distribution, and posterior heterogeneity. Existing methods, however, struggle to tackle these forms of heterogeneity simultaneously in a unified framework. In this paper, we propose a two-step learning strategy for MTL which addresses the aforementioned heterogeneity. First, we impute the missing blocks using shared representations extracted from homogeneous source across different tasks. Next, we disentangle the mappings between input features and responses into a shared component and a task-specific component, respectively, thereby enabling information borrowing through the shared component. Our numerical experiments and real-data analysis from the ADNI database demonstrate the superior MTL performance of the proposed method compared to other competing methods.

  • Physical Activity Buffers Physiological Stress during High Emotional Distress: A Wearable-Derived Prospective Cohort Study

    SSRN Electronic Journal · 2026-01-01

    preprintOpen accessSenior author
  • Optimal Transport based Cross-Domain Integration for Heterogeneous Data

    Journal of the American Statistical Association · 2025-07-03 · 2 citations

    articleSenior author

    Detecting dynamic patterns shared across heterogeneous datasets is a critical yet challenging task in many scientific domains, particularly within the biomedical sciences. Systematic heterogeneity inherent in diverse data sources can significantly hinder the effectiveness of existing machine learning methods in uncovering shared underlying dynamics. Additionally, practical and technical constraints in real-world experimental designs often limit data collection to only a small number of subjects, even when rich, time-dependent measurements are available for each individual. These limited sample sizes further diminish the power to detect common dynamic patterns across subjects. In this article, we propose a novel heterogeneous data integration framework based on optimal transport to extract shared patterns in the conditional mean dynamics of target responses. The key advantage of the proposed method is its ability to enhance discriminative power by reducing heterogeneity unrelated to the signal. This is achieved through the alignment of extracted domain-shared temporal information across multiple datasets from different domains. Our approach is effective regardless of the number of datasets and does not require auxiliary matching information for alignment. Specifically, the method aligns longitudinal data from heterogeneous datasets within a common latent space, capturing shared dynamic patterns while leveraging temporal dependencies within subjects. Theoretically, we establish generalization error bounds for the proposed data integration approach in supervised learning tasks, highlighting a novel tradeoff between data alignment and pattern learning. Additionally, we derive convergence rates for the barycentric projection under Gromov-Wasserstein and fused Gromov-Wasserstein distances. Numerical studies on both simulated data and neuroscience applications demonstrate that the proposed data integration framework substantially improves prediction accuracy by effectively aggregating information across diverse data sources and subjects. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

  • Time-varying mediation analysis for incomplete data with application to DNA methylation study for PTSD

    The Annals of Applied Statistics · 2025-12-01

    articleSenior author

    DNA methylation (DNAm) has been shown to mediate causal effects from traumatic experiences to posttraumatic stress disorder (PTSD). However, the scientific question about whether the mediation effect changes over time remains unclear. In this paper we develop time-varying structural equation models to identify cytosine-phosphate-guanine (CpG) sites, where DNAm mediates the effect of trauma exposure on PTSD, and to capture dynamic changes in mediation effects. The proposed methodology is motivated by the Detroit Neighborhood Health Study (DNHS) with high-dimensional and longitudinal DNAm measurements. To handle the nonmonotone missing DNAm in the dataset, we propose a novel longitudinal multiple imputation (LMI) method utilizing dependency among repeated measurements and employ the generalized method of moments to integrate the multiple imputations. Simulations confirm that the proposed method outperforms existing approaches in various longitudinal settings. In DNHS data analysis, our method identifies several CpG sites where DNAm exhibits dynamic mediation effects. Some of the corresponding genes have been shown to be associated with PTSD in the existing literature, and our findings on their time-varying effects could deepen the understanding of the mediation role of DNAm on the causal path from trauma exposure to PTSD risk.

  • Covariate-Elaborated Robust Partial Information Transfer with Conditional Spike-and-Slab Prior

    Journal of the American Statistical Association · 2025-12-05

    articleSenior author

    The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global similarity measure between the source data and the target data, which may lead to inefficiency when only partial information is shared. In this paper, we propose a novel Bayesian transfer learning method named “CONCERT” to allow robust partial information transfer for high-dimensional data analysis. A conditional spike-and-slab prior is introduced in the joint distribution of target and source parameters for information transfer. By incorporating covariate-specific priors, we can characterize partial similarities and integrate source information collaboratively to improve the performance on the target. In contrast to existing work, the CONCERT is a one-step procedure which achieves variable selection and information transfer simultaneously. We establish variable selection consistency, as well as estimation and prediction error bounds for CONCERT. Our theory demonstrates the covariate-specific benefit of transfer learning. To ensure the scalability of the algorithm, we adopt the variational Bayes framework to facilitate implementation. Extensive experiments and two real data applications showcase the validity and advantages of CONCERT over existing cutting-edge transfer learning methods.

  • Heterogeneous effects of physical activity on physiological stress during pregnancy

    medRxiv · 2025-03-31 · 2 citations

    preprintOpen accessSenior authorCorresponding

    Abstract Pregnancy is a critical period characterized by profound physiological and psychological adaptations that can significantly impact both maternal and fetal health outcomes. Thus, it is imperative to implement targeted and evidence-based interventions to enhance maternal well-being during the prenatal period. Mobile health (mHealth) technologies enable continuous, real-time monitoring of both physiological and psychological states, providing detailed insights into health behaviors and individual responses in natural settings. This study leveraged mHealth technologies, including the Oura smart ring and ecological momentary assessment (EMA) via a mobile app, to examine how emotional distress influences the relationship between physical activity (PA) and heart rate variability (HRV), an indicator of physiological stress during pregnancy. Consenting participants, aged 18-40 years, with a healthy singleton pregnancy in the second trimester, were enrolled in the study. Our findings revealed that among participants experiencing emotional distress, increased PA was associated with higher HRV, indicating lower physiological stress. In particular, this beneficial effect was more pronounced on days with elevated emotional distress. These findings suggest that engaging in physical activity may help protect pregnant women against autonomic dysregulation associated with emotional distress, potentially supporting maternal cardiovascular health. By utilizing mHealth technologies for real-time data collection, our study highlights the potential for personalized and adaptive interventions that promote maternal well-being by encouraging physical activity as a strategy to mitigate the physiological effects of emotional distress during pregnancy. Author summary Pregnancy involves significant physiological and psychological changes that impact maternal and fetal health. Heart rate variability (HRV), a key biomarker of autonomic function and cardiovascular health, reflects the body’s ability to regulate physiological stress. In this study, we used mobile health (mHealth) technologies, including the Oura smart ring and ecological momentary assessment (EMA) through a mobile app, to examine how emotional distress influences the relationship between physical activity (PA) and HRV during pregnancy. Our findings indicate that PA protects against autonomic dysregulation linked to emotional distress. On days with emotional distress, engaging in more PA was associated with higher HRV, indicating lower physiological stress. This positive effect was even stronger on days with increased emotional distress. By leveraging real-time monitoring, our study highlights the value of mHealth in capturing dynamic interactions between emotional and physiological states. These insights demonstrate the potential for personalized interventions using mHealth technologies to support maternal well-being by encouraging PA as a strategy to better manage physiological stress during pregnancy.

  • Meta Fusion: A Unified Framework For Multimodality Fusion with Mutual Learning

    ArXiv.org · 2025-07-27

    preprintOpen access

    Developing effective multimodal data fusion strategies has become increasingly essential for improving the predictive power of statistical machine learning methods across a wide range of applications, from autonomous driving to medical diagnosis. Traditional fusion methods, including early, intermediate, and late fusion, integrate data at different stages, each offering distinct advantages and limitations. In this paper, we introduce Meta Fusion, a flexible and principled framework that unifies these existing strategies as special cases. Motivated by deep mutual learning and ensemble learning, Meta Fusion constructs a cohort of models based on various combinations of latent representations across modalities, and further boosts predictive performance through soft information sharing within the cohort. Our approach is model-agnostic in learning the latent representations, allowing it to flexibly adapt to the unique characteristics of each modality. Theoretically, our soft information sharing mechanism reduces the generalization error. Empirically, Meta Fusion consistently outperforms conventional fusion strategies in extensive simulation studies. We further validate our approach on real-world applications, including Alzheimer's disease detection and neural decoding.

  • Heterogeneous effects of physical activity on physiological stress during pregnancy

    PLOS Digital Health · 2025-10-22 · 1 citations

    articleOpen accessSenior authorCorresponding

    Pregnancy involves rapid physiological and psychological changes that can increase vulnerability to health complications, underscoring the need for timely, individualized support. Mobile health (mHealth) tools offer a scalable way to capture repeated measures of health status throughout pregnancy, facilitating longitudinal assessment and the opportunity for timely intervention. This study leveraged mHealth technologies, including the Oura smart ring and ecological momentary assessment (EMA) via a mobile app, to examine how emotional distress affects the relationship between physical activity (PA) and heart rate variability (HRV), an indicator of physiological stress during pregnancy. Specifically, we examined whether emotional distress, measured via daily EMA surveys, moderates the association between physical activity and nighttime HRV, captured by continuous Oura ring data. Hence, this analysis integrated temporally aligned wearable and self-report data to investigate the interaction between subjective emotional states and objectively measured physical activity patterns. Consenting participants, aged 18-40 years, with a healthy singleton pregnancy in the second trimester, were enrolled in the study. Our findings revealed that on days with high emotional distress, each additional 1,000 steps was associated with a 3.5% increase in nighttime HRV (p-value < 0.001; 95% CI: 2.6%, 4.4%). In contrast, physical activity had little to no association with HRV on days with moderate distress (0.6%; 95% CI: -0.7%, 1.9%) and low distress (0.6%; 95% CI: -0.4%, 1.5%). These findings suggest that physical activity may be particularly beneficial on high-distress days, supporting the development of adaptive interventions that prioritize PA engagement during periods of elevated emotional distress. Based on our model-estimated moderation effects, we may recommend that a pregnant woman increase her physical activity on high-distress days due to a strong positive PA-HRV association, whereas for those who do not experience much emotional distress, the recommendation may be less emphasized, given the weaker observed association.

  • Multi-task Learning for Heterogeneous Data via Integrating Shared and Task-Specific Encodings

    ArXiv.org · 2025-05-30

    preprintOpen accessSenior author

    Multi-task learning (MTL) has become an essential machine learning tool for addressing multiple learning tasks simultaneously and has been effectively applied across fields such as healthcare, marketing, and biomedical research. However, to enable efficient information sharing across tasks, it is crucial to leverage both shared and heterogeneous information. Despite extensive research on MTL, various forms of heterogeneity, including distribution and posterior heterogeneity, present significant challenges. Existing methods often fail to address these forms of heterogeneity within a unified framework. In this paper, we propose a dual-encoder framework to construct a heterogeneous latent factor space for each task, incorporating a task-shared encoder to capture common information across tasks and a task-specific encoder to preserve unique task characteristics. Additionally, we explore the intrinsic similarity structure of the coefficients corresponding to learned latent factors, allowing for adaptive integration across tasks to manage posterior heterogeneity. We introduce a unified algorithm that alternately learns the task-specific and task-shared encoders and coefficients. In theory, we investigate the excess risk bound for the proposed MTL method using local Rademacher complexity and apply it to a new but related task. Through simulation studies, we demonstrate that the proposed method outperforms existing data integration methods across various settings. Furthermore, the proposed method achieves superior predictive performance for time to tumor doubling across five distinct cancer types in PDX data.

  • Reinforcement learning for individual optimal policy from heterogeneous data

    The Annals of Statistics · 2025-08-01 · 1 citations

    articleOpen accessSenior author

    Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Annie Qu

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup