
Paul Avillach
VerifiedHarvard University · Biomedical Informatics
Active 2007–2026
About
Paul Avillach, MD, PhD, is an Associate Professor of Biomedical Informatics at Harvard Medical School. He also holds positions as an Assistant Professor of Pediatrics at Boston Children's Hospital and in the Department of Epidemiology at Harvard T.H. Chan School of Public Health. His research focuses on biomedical informatics, integrating computational methods to advance understanding and treatment of health conditions. He leads the Avillach Lab, contributing to the development of innovative approaches in biomedical discovery and clinical decision making. His work is characterized by a commitment to leveraging data science and informatics to improve healthcare outcomes and facilitate translational research.
Research topics
- Medicine
- Genetics
- Artificial Intelligence
- Internal medicine
- Biology
- Pathology
- Computer Science
- Emergency medicine
- Demography
- Intensive care medicine
- Evolutionary biology
- Computational biology
- Data science
- Pediatrics
- Bioinformatics
Selected publications
Rare <i>KDR</i> Variants Define a Distinct Genetic Contribution to Congenital Heart Disease
Circulation Genomic and Precision Medicine · 2026-04-22
articleOpen accessThe AIM-AHEAD Research Fellowship Program: Improving Health Research with AI/ML for all Americans.
Zenodo (CERN European Organization for Nuclear Research) · 2025-07-09
articleOpen accessSenior authorThe AIM-AHEAD Research Fellowship Program is a transformative initiative supporting early-career researchers, including graduate students, postdoctoral scholars, junior faculty, and those conducting research outside academia. This program offers a unique opportunity for researchers to receive funding and support for pioneering data science projects that address health research in AI/ML algorithm or data. Aligned with the overarching goals of the AIM-AHEAD initiative, the program emphasizes the use of Artificial Intelligence and Machine Learning (AI/ML) methodologies to analyze biomedical research data, including clinical and genomic cohorts. The primary focus is on North Star (III): Use AI/ML to improve behavioral health, cardiometabolic health and cancer outcomes for all. The program's cohorts have shown increasing interest, with the first cohort beginning in September 2022, the second in September 2023, the third in October 2024, and the fourth expected to start in September 2025. Application submissions grew from 41 for the first cohort to 85 for the forth cohort. Over the years, the available datasets expanded from one in 2022 to five in 2025, supporting diverse research areas including mental health, substance abuse disorder, diabetes, cardiovascular disease, maternity health, HIV, food insecurity, and sleep health. The AIM-AHEAD Research Fellowship Program is a critical platform for early-career researchers to engage in cutting-edge AI/ML research with the potential to significantly increasing health research. By supporting innovative projects and fostering a collaborative research environment, AIM-AHEAD aims to make meaningful strides toward health improving health for all Americans.
Circulation · 2025-11-03
articleBackground: Right ventricular (RV) dysfunction is an important determinant of outcomes in many forms of CHD. However, clinical and imaging biomarkers explain only 25% of the variability in long-term outcomes in patients with repaired tetralogy of Fallot (rTOF). The contribution of genetic factors to RV dysfunction in rTOF is a knowledge gap. Hypothesis: Genetic factors contribute to variation in RV function in patients with rTOF. Methods: We studied the relationship of rare damaging genetic variants (RDV) and RV function, assessed by cardiac MRI in a cohort of 223 rTOF patients with genome or exome sequencing, recruited under the auspices of the Pediatric Cardiac Genomics Consortium at Boston Children’s Hospital. We characterized demographics, genotypes, clinical and postoperative variables, and cardiac MRI measurements. Rare variants (gnomAD allele frequency <10 -4 ) were considered damaging if predicted to be loss-of-function (nonsense, frameshift, or read-through), missense (REVEL score >0.5), or splice altering (SpliceAI delta score >0.8). Association was determined by comparing the proportion of participants with RV dysfunction (RV ejection fraction <45%) and RDV in 70 genes associated with pediatric onset cardiomyopathy, as reported in ClinVar. We performed multivariable logistic regression to adjust for independent predictors of RV dysfunction. Results: Patients with rTOF and RV dysfunction were older at MRI (15.3 years vs. 12.6 years, p=0.01), and more likely to be male (71% vs. 49%, p=0.002), have a history of arrhythmia (26% vs. 11%, p=0.008), or repaired prior to 1985 (17% vs 6%, p = 0.01). 22q11 deletion syndrome was not associated with RV dysfunction. Heterozygous RDV in genes associated with pediatric cardiomyopathy were more common in patients with RV dysfunction (11% vs. 1%; OR 8.63, p =0.007). In a multivariable model (C statistic = 0.71), presence of a pediatric cardiomyopathy variant remained associated with RV dysfunction (OR 1.44, p=0.01). Conclusions: RDV in genes associated with pediatric cardiomyopathy are associated with RV dysfunction in patients with rTOF. These genes would not be expected to be causal for CHD but instead modify myocardial function. While future larger multicenter studies should validate these findings, these results suggest that pediatric cardiomyopathy variants may affect outcomes, improve risk-stratification, and provide more precise personalized therapies for CHD.
BMJ Public Health · 2025-07-01
articleOpen accessSenior authorIntroduction: Non-pharmaceutical interventions (NPIs) such as mask-wearing and social distancing, implemented as public health measures to slow COVID-19 transmission, had a major impact on the epidemiology of viral infections. However, little is known about their influence on bacterial infections in children. Methods: We performed a multicentre observational study including eight hospitals in three countries (Spain, UK and USA). All hospitalisations in children under the age of 18 from January 2019 to February 2023 were included. Electronic health record data were used to assess changes in hospitalisations for bacterial infections in three different periods based on NPI stringency, classified as pre-NPI (January 2019 to February 2020), full NPI (March 2020 to February 2021) and partial NPI (March 2021 to February 2023). The primary outcomes were the counts of hospitalisations for invasive, respiratory and skin-associated bacterial infections. To identify changes in the monthly counts of bacterial infections in a data-driven manner, we used a multivariable quasi-Poisson regression model adjusting for important covariates with adaptive lasso penalty. We then assessed the statistical significance of the identified changes and examined the temporal trend before and after each change point. Results: We found that of the 508 585 paediatric hospitalisations, 41 076 (8.1%) were associated with any bacterial infection. 14 656 (35.7%) were invasive bacterial infections, 6763 (16.5%) were respiratory tract-associated and 7757 (18.9%) were skin-associated. Counts of bacterial infections decreased during the full-NPI period (average count 93.7 infections/month) compared with the pre-NPI period (average count 104.8 infections/month) and increased during the partial NPI period (average count 112.4 infections/month). A quasi-Poisson regression model showed a significant decrease in respiratory tract-associated bacterial infections after the start of the COVID-19 pandemic and a subsequent significant increase after the gradual lifting of NPIs, peaking during the winter of 2022-2023. No significant changes were observed over time for skin-associated and invasive bacterial infections. Conclusions: The implementation of COVID-19 NPIs was significantly associated with changes in hospitalisations for respiratory associated-bacterial infections, but not invasive and skin-associated bacterial infections. These findings suggest that the impact of NPIs has been greatest for respiratory infections and indicate the potential of targeted NPIs to reduce these infections among children in the future.
Use of Computational Phenotypes for Predicting Genetic Subgroups of Cerebral Palsy
Pediatric Neurology · 2025-09-25
articleUNC Libraries · 2025-05-13
articleOpen accessBACKGROUND: Many countries have experienced 2 predominant waves of COVID-19-related hospitalizations. Comparing the clinical trajectories of patients hospitalized in separate waves of the pandemic enables further understanding of the evolving epidemiology, pathophysiology, and health care dynamics of the COVID-19 pandemic. OBJECTIVE: In this retrospective cohort study, we analyzed electronic health record (EHR) data from patients with SARS-CoV-2 infections hospitalized in participating health care systems representing 315 hospitals across 6 countries. We compared hospitalization rates, severe COVID-19 risk, and mean laboratory values between patients hospitalized during the first and second waves of the pandemic. METHODS: Using a federated approach, each participating health care system extracted patient-level clinical data on their first and second wave cohorts and submitted aggregated data to the central site. Data quality control steps were adopted at the central site to correct for implausible values and harmonize units. Statistical analyses were performed by computing individual health care system effect sizes and synthesizing these using random effect meta-analyses to account for heterogeneity. We focused the laboratory analysis on C-reactive protein (CRP), ferritin, fibrinogen, procalcitonin, D-dimer, and creatinine based on their reported associations with severe COVID-19. RESULTS: Data were available for 79,613 patients, of which 32,467 were hospitalized in the first wave and 47,146 in the second wave. The prevalence of male patients and patients aged 50 to 69 years decreased significantly between the first and second waves. Patients hospitalized in the second wave had a 9.9% reduction in the risk of severe COVID-19 compared to patients hospitalized in the first wave (95% CI 8.5%-11.3%). Demographic subgroup analyses indicated that patients aged 26 to 49 years and 50 to 69 years; male and female patients; and black patients had significantly lower risk for severe disease in the second wave than in the first wave. At admission, the mean values of CRP were significantly lower in the second wave than in the first wave. On the seventh hospital day, the mean values of CRP, ferritin, fibrinogen, and procalcitonin were significantly lower in the second wave than in the first wave. In general, countries exhibited variable changes in laboratory testing rates from the first to the second wave. At admission, there was a significantly higher testing rate for D-dimer in France, Germany, and Spain. CONCLUSIONS: Patients hospitalized in the second wave were at significantly lower risk for severe COVID-19. This corresponded to mean laboratory values in the second wave that were more likely to be in typical physiological ranges on the seventh hospital day compared to the first wave. Our federated approach demonstrated the feasibility and power of harmonizing heterogeneous EHR data from multiple international health care systems to rapidly conduct large-scale studies to characterize how COVID-19 clinical trajectories evolve.
Use of Computational Phenotypes for Predicting Genetic Subgroups of Cerebral Palsy
medRxiv · 2025-02-13
preprintOpen accessIntroduction: Emerging evidence suggests that 20-30% of cases of cerebral palsy (CP) may have a genetic cause. Our group previously identified subsets of patients with CP or CP-masquerading conditions who warrant genetic testing, including those with regression or progressive neurological symptoms (CP masqueraders) and those without any known risk factors for CP (cryptogenic CP). Recognition of these subgroups in clinical settings remains challenging. Methods: To address this challenge, we developed and evaluated a computational phenotyping approach using ICD- 9/ICD-10 billing codes to automatically identify patients with unexplained CP or CP-masquerading conditions who may benefit from genetic testing. We applied this computational phenotyping approach to a cohort of 250 participants from the Boston Children's Hospital CP Sequencing Study, aimed at identifying genetic causes in CP and CP-masquerading conditions. Results: Manual review served as the gold standard, identifying 8% as CP masqueraders, 42% as cryptogenic CP, and 50% as non-cryptogenic CP. Computational phenotyping based on ICD-9/10 codes achieved a sensitivity of 95%, specificity of 72%, positive predictive value of 77%, and negative predictive value of 94% in identifying cases warranting genetic testing. Conclusions: Our findings demonstrate the feasibility of using computational phenotyping to identify patients with CP or CP- masquerading conditions who warrant genetic testing. Further studies are needed to evaluate the effectiveness and real-world application of this tool in larger healthcare systems. Nonetheless, the computational phenotyping approach holds promise as a possible clinical decision support that could be integrated into electronic health record systems, enhancing clinical workflows and facilitating actionable genetic diagnoses.
The High Prevalence of Rare Pulmonary Diseases Among Patients With Severe Asthma
American Journal of Respiratory and Critical Care Medicine · 2025-05-01
articleAbstract Rationale: Rare pulmonary diseases (RPDs) in children, such as childhood interstitial lung disease (chILD) and genetic forms of bronchiectasis (including cystic fibrosis [CF] and primary ciliary dyskinesia [PCD]), are difficult to diagnose and often misdiagnosed as asthma, leading to delays in appropriate treatment. Early diagnosis through newborn genetic screening has dramatically increased CF life expectancy, highlighting the importance of early RPD diagnosis for early intervention and preservation of pulmonary function. Anecdotal evidence suggests that RPD are often initially misdiagnosed as severe or difficult-to-treat asthma, and misdiagnosing RPD as asthma delays proper care and exposes children to harmful and costly asthma treatments. However, the extent of this problem in clinical practice has not been explored. Objective: This study aims to estimate the prevalence of undiagnosed RPDs in children with severe asthma, testing the hypothesis that RPDs are more common in severe compared to non-severe asthma cases. Methods: We leveraged data from the Genomic Information Commons (GIC), a network of six academic children's hospitals with access to electronic health records (EHRs), biosamples, and genomic data. Using GIC's PIC-SURE query tool, we analyzed EHRs from 14,907,602 patients for ICD-10 codes related to asthma, severe asthma, bronchiectasis, and chILD. Chi-square tests were used to assess RPD prevalence differences between severe and non-severe asthma patients. For 6,019 asthma patients with exome sequence data, we further analyzed rare genetic variants associated with RPD. Results: Of 441,864 patients diagnosed with asthma across GIC hospitals, 10,753 had severe asthma (2.4%), 11,388 had bronchiectasis (2.6%), and 7,834 had chILD (1.8%). Among patients with severe asthma, RPD prevalence was 5.9 times higher than in those with non-severe asthma (4.7% vs. 0.8%, p &lt; E-16), with notable increases for both bronchiectasis (3.5% vs. 0.5%, 6.8-fold increase) and chILD (1.2% vs. 0.3%, 4.3-fold-increase). Among patients with exome sequence data, those with severe asthma showed a 4.1-fold increase in RPD-related functional genetic variants compared to non-severe asthma cases (17.3% vs. 4.2%, p &lt; 0.00001). Filaggrin (FLG) gene variants, linked to severe asthma, were present in ∼1/3 of severe cases. Excluding FLG variants, enrichment remained significant, especially in genes causing PCD. Conclusion: This large-scale analysis suggests that 1 in every 21 patients with severe asthma carry an RPD diagnosis. Though reliant on ICD-10 codes, our observations are further supported by a 4-fold increase in RPD-causing loss-of-function genetic variants, underscoring the need for heightened clinical awareness of these disorders in patients with severe asthma.
International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium
UNC Libraries · 2025-06-26
articleOpen accessPIC-SURE: an open-source platform for integrating clinical and genomic data
npj Digital Medicine · 2025-12-30
articleOpen accessSenior authorPIC-SURE is an open-source platform for integrating and analyzing large-scale clinical and genomic data. Part of the NIH NHLBI BioData Catalyst® ecosystem, PIC-SURE enables real-time cohort building and analysis across 1.4 M participants within 273 studies (May 2025). Through a graphical interface and API, researchers can explore and analyze complex datasets in real time. This flexibility supports scalable and reproducible research, lowering the barrier to integrated clinical and genomic data analysis.
Recent grants
Frequent coauthors
- 123 shared
Alba Gutiérrez‐Sacristán
Harvard University
- 123 shared
Isaac S. Kohane
Harvard University
- 116 shared
Gabriel A. Brat
Harvard University
- 113 shared
Griffin M. Weber
- 112 shared
Tianxi Cai
Harvard University
- 110 shared
Anita Burgun
Hôpital Européen Georges-Pompidou
- 106 shared
Antoine Neuraz
Centre de Recherche des Cordeliers
- 104 shared
Amelia L.M. Tan
Harvard University
Labs
Avillach LabPI
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Paul Avillach
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup