Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Rebecca A Hubbard

Rebecca A Hubbard

Verified

University of Pennsylvania · Rehabilitation Medicine

Active 1931–2026

h-index50
Citations12.7k
Papers482201 last 5y
Funding$792k
See your match with Rebecca A Hubbard — sign in to PhdFit.Sign in

About

Rebecca A Hubbard, Ph.D., is an Adjunct Professor of Biostatistics and Epidemiology at the University of Pennsylvania Perelman School of Medicine. She serves as a Senior Scholar at the Center for Clinical Epidemiology and Biostatistics, a Senior Fellow at the Institute for Biomedical Informatics, and is a member of the Abramson Cancer Center. She also holds the position of Vice Chair for Faculty Professional Development in the Department of Biostatistics, Epidemiology & Informatics. Her research focuses on the development and application of statistical methodology for studies using observational data from community medical practice. This includes evaluation of screening and diagnostic test performance, methods for comparative effectiveness studies, and health services research. Dr. Hubbard's methodological work emphasizes the development of statistical tools for valid inference from complex electronic medical record data, which she has applied to studies in cancer screening, aging and dementia, pharmacoepidemiology, women’s health, and behavioral health.

Research topics

  • Medicine
  • Internal medicine
  • Political Science
  • Cardiology
  • Medical emergency
  • Surgery
  • Oncology
  • Emergency medicine

Selected publications

  • Racial and Ethnic Differences in Prevalence and Incidence of Diabetic Retinal Disease

    Retina · 2026-03-31

    article

    PURPOSE: To determine prevalence and incidence trends of diabetic retinal disease (DRD) and its vision-threatening forms over the last 20 years among patients with diabetes mellitus (DM) among US racial and ethnic groups. METHODS: A retrospective cohort study of members of commercial and Medicare Advantage health plans between 2000 and 2022 was conducted, with cohorts of White(W), Black/African American(B/AA), Hispanic(H), and Asian(A) DM patients identified using ICD-9/10 codes. Outcomes included annual prevalence and incidence of DRD, diabetic macular edema (DME), and proliferative diabetic retinopathy (PDR). Multivariable logistic and Poisson regression models analyzed trends in prevalence odds ratios and incidence rate ratios, respectively. RESULTS: B/AA patients had higher prevalence rates of DRD every year analyzed compared with White patients (2021 B/AA:23.1%; W:19.0%; p<0.001). Both Hispanic (2001 H:12.3%) and Asian (2001:11.9%) patients initially had lower DRD prevalence than White patients (2001:13.1%; p<0.001 for both); both are now higher with Hispanic patients having the highest rates (2021 H:26.0%; A:21.2%;W:19.0%, p<0.001). DME and PDR prevalence increased across all groups through 2015/2016, then decreased through 2021 (2021 DME:W:4.5%, B/AA:5.9%; H:5.9%, A:4.7%; 2021 PDR:W:2.9%, B/AA:4.3%, H:5.0%, A:2.9%).Since 2009, incidence rates for DRD, DME, and PDR in Hispanic and B/AA patients have been higher than for White patients (IRR=1.08-1.85; p<0.001 for all comparisons). Asian patients initially had higher DRD incidence rates than White patients, but that difference disappeared in 2021 before increasing again in 2022 (2022 IRR=1.07, 95%CI=1.01-1.14). CONCLUSION: Disparities in prevalence and incidence of DRD, DME, and PDR persist for B/AA and have worsened for Hispanic patients.

  • Performance of Statistical and Machine Learning Risk Prediction Models for Advanced Breast Cancers

    Cancer Epidemiology Biomarkers & Prevention · 2026-05-14

    article

    BACKGROUND: Machine learning enables complex risk prediction models, but comparative performance with statistical approaches remains context-dependent. We compared statistical and machine learning models for predicting advanced breast cancer risk. METHODS: Using data from 968,178 women (40-74 years) undergoing 2,796,459 annual or 812,126 biennial screening mammograms (2005-2019) in the Breast Cancer Surveillance Consortium, we cross-validated models predicting advanced breast cancer within 12 months (annual) or 24 months (biennial) following screening. Models included conventional logistic regression, regularized regressions (LASSO, Elastic net), and machine learning methods (random forests, gradient boosting), considering a modest number of clinical and demographic predictors. Performance was assessed using calibration and area under the receiver operating characteristic curve (AUC). RESULTS: Discrimination was similar across models (AUC 0.677-0.690). Calibration differences were more pronounced. Regularized regressions achieved the most favorable calibration overall and across racial and ethnic groups, with AUC 0.689 (95%CI = 0.676-0.701). Gradient boosting showed comparable AUC but suboptimal calibration (calibration slope 1.12; 95%CI = 1.04-1.20). Conventional logistic regression had slightly lower AUC (0.683; 95%CI = 0.671-0.696) and calibration slope of 0.90 (95%CI = 0.83-0.96). Regression-based approaches were generally well calibrated across racial and ethnic groups (E/O ratio 0.96-1.03; calibration intercept -0.03 to 0.04), with some subgroup deviations in calibration slopes (<1). CONCLUSIONS: For predicting advanced breast cancers, regularized regression demonstrated similar discrimination and generally more favorable calibration than other approaches. IMPACT: In settings with rare outcomes and low dimensional features, regularized regression may offer a practical balance between performance and interpretability.

  • Statistical Methods for Phenotype Estimation and Analysis Using Electronic Health Records [Methods Study], 2016-2021

    ICPSR Data Holdings · 2026-03-23

    datasetOpen access1st authorCorresponding

    Researchers can use data from electronic health records, or EHRs, in studies that compare two or more treatments. In these studies, researchers need to identify all patients with the same phenotype. Phenotypes are a person's known traits, like height and weight, or known health problems, like diabetes. However, in EHR data, some data on patient traits or health problems may be missing for some patients. Missing data in EHRs make it hard to correctly identify all patients with the same phenotype. It's even harder when data are missing due to a patient's health status. For example, patients with uncontrolled diabetes may need more lab tests than patients with controlled diabetes. As a result, researchers who are looking at lab tests may not identify patients with controlled diabetes as having diabetes. In this project, the research team developed and tested a new statistical method that accounts for missing EHR data to estimate patient phenotypes. To access the methods and software, please visit the bias_correction GitHub repository.

  • Informed presence in electronic health record data: illustrating bias and bias reduction approaches in longitudinal analyses

    Epidemiology · 2026-03-18

    article

    Electronic health record (EHR) systems capture patient information inconsistently, with patients generally contributing more data when they are sick than healthy. This creates "informed presence," systematic differences between captured and non-captured data, potentially biasing association estimates. There is growing interest in methods that account for informed presence, but practical approaches for conceptualizing, identifying, and addressing this bias in applied EHR research have received limited attention. Focusing on longitudinal settings, we present a conceptual framework for informed presence bias, which arises when data capture depends on exposure and outcome and thus the visit process acts as a collider. We then illustrate methods that aim to reduce bias by reweighting or resampling observed data to approximate conditional independence between the visit process and the outcome. We illustrate these methods using longitudinal EHR data from pediatric solid organ transplant recipients (N=271) to examine the association between steroids and cytomegalovirus viremia, where the frequency of cytomegalovirus testing varies across patients and over time. Incidence rate ratios decreased from 1.83 (95% CI 1.02, 3.28) in a naïve analysis to 1.37 (0.73, 2.57) when accounting for informed presence using inverse intensity weighting. Incidence rate ratio estimates from bootstrapped inverse intensity weighting were 1.37 (0.71, 2.27) and 1.40 (0.73, 2.68) from multiple outputation. These results show the anticipated attenuation of effect estimates after accounting for informed presence bias. When analyzing irregularly measured EHR data, we recommend (1) identifying the expected observation process using conceptual diagrams, (2) assessing dependence in the observation process, and (3) accounting for outcome dependence in statistical analysis.

  • Statistical Methods for Phenotype Estimation and Analysis Using Electronic Health Records [Methods Study], 2016-2021

    ICPSR Data Holdings · 2026-03-23

    datasetOpen access1st authorCorresponding

    Researchers can use data from electronic health records, or EHRs, in studies that compare two or more treatments. In these studies, researchers need to identify all patients with the same phenotype. Phenotypes are a person's known traits, like height and weight, or known health problems, like diabetes. However, in EHR data, some data on patient traits or health problems may be missing for some patients. Missing data in EHRs make it hard to correctly identify all patients with the same phenotype. It's even harder when data are missing due to a patient's health status. For example, patients with uncontrolled diabetes may need more lab tests than patients with controlled diabetes. As a result, researchers who are looking at lab tests may not identify patients with controlled diabetes as having diabetes. In this project, the research team developed and tested a new statistical method that accounts for missing EHR data to estimate patient phenotypes. To access the methods and software, please visit the bias_correction GitHub repository.

  • Missingness in Eligibility Criteria for Target Trial Emulation in EHR With Survival Outcomes

    Statistics in Medicine · 2026-04-01

    articleOpen accessSenior authorCorresponding

    In certain settings, when conducting a randomized trial would be infeasible, electronic health records (EHR) can be used to emulate a target trial and estimate causal effects of an intervention. This process involves specifying the elements of a hypothetical trial protocol and applying these to the design of an observational study conducted with EHR data (or other observational data source). One element of target trial specification includes defining eligibility criteria. However, defining the eligible population with EHR can be complicated by missingness in eligibility-defining variables. Multiple imputation (MI) is one common approach to missingness in EHR data, but it is unclear whether imputation of eligibility criteria should occur before or after excluding ineligible individuals. Motivated by a target trial emulation of two treatments for advanced breast cancer, we explore this question when estimating the average causal effect under a target trial framework with survival outcomes. We illustrate how alternative MI strategies perform using simulated data and in a real-world analysis of oncology EHR data. We found that in most settings with high proportions of missingness in eligibility-defining variables, imputing missing data using a flexible imputation model, such as a random forest, prior to excluding ineligible individuals resulted in lower bias than complete case analysis or imputation after excluding ineligible individuals. Choices about how to handle practical challenges such as this in the application of target trial emulation to messy, real-world data sources can have substantial effects on causal parameter estimation and should be carefully considered to ensure that the results of observational studies are as rigorous as possible.

  • An E-value-Informed Sensitivity Analysis Framework for Hybrid Controlled Trials

    medRxiv · 2026-03-06

    articleOpen accessSenior authorCorresponding

    Hybrid controlled trials (HCTs) incorporate real-world data into randomized controlled trials (RCTs) by augmenting the internal control arm with patients receiving the same treatment in routine care. Beyond increasing power, HCTs may improve recruitment by supporting unequal randomization ratios that increase patient access to experimental treatments. However, HCT validity is threatened by bias from unmeasured confounding due to lack of randomization of external controls, leading to outcome non-exchangeability between internal and external control patients. To address this challenge, we developed a sensitivity analysis framework to assess the robustness of HCT results to potential unmeasured confounding. We propose a tipping point analysis that adapts the E-value framework to the HCT setting where trial participation rather than treatment assignment is subject to confounding. To aid interpretation, we also introduce a data-driven benchmark representing the strength of unmeasured confounding reflected by the observed outcome non-exchangeability. We then propose an operational decision rule and evaluate its performance through simulation studies. Finally, we illustrate the approach using an asthma trial augmented by data from electronic health records. Simulation results demonstrate that our decision rule safeguards against Type I error inflation while preserving the power gains achieved by incorporating external data. In settings where moderate unmeasured confounding led to poorer outcomes for external controls, Type I error was controlled near the nominal 5% level, and power increased by 10-20% compared with analyses using RCT data alone. Our approach provides a practical, interpretable method to assess HCT robustness, supporting rigorous inference when integrating external real-world data.

  • STRASS 2 target trial emulation: Bridging the gap between trial efficacy and real-world effectiveness

    Surgery · 2026-04-04

    article
  • Racial disparities and utilization trends of first-line targeted therapies for metastatic breast cancer

    JNCI Cancer Spectrum · 2026-04-21

    articleOpen access

    PURPOSE: We aimed to determine temporal trends and racial disparities in utilization and time to treatment initiation (TTI) of CDK4/6 inhibitors (CDK4/6i) and pertuzumab for first-line metastatic breast cancer (MBC). DESIGN: We extracted data from a nationwide electronic health record-derived deidentified database. Female patients ≥18 years old with ER+/HER2- or HER2+ MBC eligible for CDK4/6i(3/2015-10/2021) or pertuzumab(07/2012-09/2021) were included. Our outcomes were adjusted temporal trends in the proportion of patients receiving respective therapies using logistic regression with natural cubic splines for time trends and tested for changes in utilization over time within and between racial groups (non-Hispanic White (NHW) or non-Hispanic Black (NHB). Similar models using linear regression estimated mean TTI. RESULTS: 5173(NHW = 4478; NHB = 695) ER+/HER2- and 2321(NHW = 1915; NHB = 406) HER2+ MBC patients were included. There were significant differences in the proportion initiating CDK4/6i over time within racial groups (NHW, 23.5%(95%CI: 20.1%-27.3%) in 2015 to 53.8%(95%CI: 48.6%-59.0%) in 2021; NHB, 20.6%(95%CI: 11.9%-33.0%) in 2015 to 73.6%(95%CI: 61.7%-83.0%) in 2021) and between groups(p = 0.009). There was a significant increase in utilization of pertuzumab within both racial groups over time(p < 0.001), but no significant difference between groups(p = 0.45). TTI decreased over time with no significant differences in TTI trends between the two groups. CONCLUSIONS: Utilization of targeted therapies increased over time, however NHB patients were less likely to receive CDK4/6i compared to NHW. Approximately half of eligible patients did not receive pertuzumab. Further research is needed to understand mediators and design interventions to address underutilization of these therapies and those contributing to racial disparities in CDK4/6i utilization.

  • Tubeless automated insulin delivery versus multiple daily injections in children and adults with type 1 diabetes with elevated HbA1c (RADIANT): a multicentre, international, parallel-group, open-label, randomised, controlled trial

    Repository@Nottingham (University of Nottingham) · 2026-02-23

    articleOpen access

Recent grants

Frequent coauthors

  • Karla Kerlikowske

    San Francisco VA Health Care System

    154 shared
  • Diana L. Miglioretti

    138 shared
  • Louise M. Henderson

    126 shared
  • Diana S. M. Buist

    Menlo School

    125 shared
  • Tracy Onega

    122 shared
  • Weiwei Zhu

    Second Hospital of Anhui Medical University

    94 shared
  • Janie M. Lee

    Fred Hutch Cancer Center

    86 shared
  • Laura Ichikawa

    Kaiser Permanente Washington Health Research Institute

    85 shared

Education

  • PhD, Biostatistics

    University of Washington

    2007
  • MSc, Statistics

    University of Oxford

    2002
  • MSc, Epidemiology

    University of Edinburgh

    2001
  • BS, Biology

    University of Pittsburgh

    1999
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Rebecca A Hubbard

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup