Devika Subramanian

· Assistant Professor of Computer ScienceVerified

Rice University · Computer Science

Active 1985–2024

h-index32

Citations3.7k

Papers14021 last 5y

Funding$400k

Faculty page Lab page

See your match with Devika Subramanian — sign in to PhdFit.Sign in

About

Devika Subramanian is a Professor of Computer Science and of Electrical and Computer Engineering at Rice University. She is a member of the Ken Kennedy Institute and the Rice Digital Health Initiative, as well as the Houston Methodist-Rice Digital Health Institute. Her research interests encompass artificial intelligence and machine learning, with applications in computational systems biology, neuroscience of human learning, hurricane risk assessment, power grid network analysis, mortality prediction in cardiology, conflict forecasting, and analysis of terrorist networks, along with the analysis of unstructured text data. Her work involves developing algorithms and tools for predictive modeling across engineering, natural sciences, and social sciences. Dr. Subramanian holds a PhD and an MS in Computer Science from Stanford University, obtained in 1989 and 1984 respectively, and a BTech in Computer Science and Electrical Engineering from the Indian Institute of Technology Kharagpur, earned in 1982. Throughout her career, she has received numerous honors and awards, including being an invited plenary speaker at the Heart Failure Society of America in 2009 and the International Joint Conference on Artificial Intelligence in 2007. She has served on various advisory and editorial boards, such as the CRA Distinguished Lecturer at the University of Washington, the IJCAI Advisory Board, and the editorial board of the Journal of AI Research. Additionally, she was recognized with the Julia Miles Chance Prize for Excellence in Teaching at Rice University in 2000.

Research topics

Machine Learning
Computer Science
Artificial Intelligence
Applied mathematics
Theoretical computer science
Mathematics
Algorithm

Selected publications

A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study
JMIR Diabetes · 2024-08-07 · 4 citations
articleOpen access1st author
Background Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20% of patients, with an economic cost of $5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D. Objective We aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data. Methods We conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model’s predictive performance using the area under the receiver operating characteristic curve–weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions. Results Our model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P<.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5%, 20%, and 48% risk of postdiagnosis DKA. Conclusions We have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors.
Publisher DOI
OR32-02 Thyroid Cancer Polygenic Risk Score Improves Risk Stratification Of Thyroid Nodules When Added To Ultrasound Imaging
Journal of the Endocrine Society · 2023-10-01 · 1 citations
articleOpen access
Abstract Disclosure: N. Pozdeyev: None. M. Dighe: None. M. Barrio: None. C. Raeburn: None. H.A. Smith: None. M. Fisher: None. S. Chavan: None. N. Rafaels: None. J. Shortt: None. M. Leu: None. T. Clark: None. C. Marshall: None. B.R. Haugen: None. D. Subramanian: None. K. Crooks: None. C. Gignoux: None. T.A. Cohen: None. Purpose. Evaluating thyroid nodules to rule out malignancy is a common clinical task. Image-based risk stratification schemas rely on the presence of high-risk thyroid nodule sonographic features and, therefore, are less suitable for the diagnosis of malignant thyroid nodules that have a benign appearance on the ultrasound. To mitigate the deficiency of thyroid nodule evaluation relying solely on the sonographic characteristics, we used thyroid cancer polygenic risk score (PRS) to complement deep learning analysis of ultrasound images. Methods. A supervised deep learning classifier of thyroid nodules was trained on 32,545 thyroid US images from 621 nodules and tested on an independent set of 232 nodules from patients genotyped on the Illumina's MEGAEX platform. The deep-learning thyroid nodule classifier was developed by fine-tuning a BiT-M ResNet-50x1 convolutional neural network (CNN) pre-trained on the ImageNet-21k dataset. A polygenic risk score (PRS) was calculated using thyroid cancer genome-wide association meta-analysis summary statistics from the Global Biobank Meta-analysis Initiative. The thyroid cancer PRS was defined as a weighted sum of five alleles with the strongest association with thyroid cancer. CNN predictions and PRS were combined into a meta-classifier using logistic regression with or without genetic ancestry and demographic covariates. Results. The CNN classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.83 on the out-of-sample test set of 232 thyroid nodules. The CNN classifier incorrectly classified thyroid nodules without suspicious sonographic characteristics belonging to difficult-to-diagnose subtypes such as follicular thyroid cancer and follicular variant of papillary thyroid cancer. Combining predictions from the CNN classifier with PRS into a cross-validated classifier improved AUC to 0.868 (DeLong test, p = 0.05). Incorporating genetic ancestry in the form of five genetic principal components further improved AUC of the benign vs. malignant CNN + PRS thyroid nodule classifier to 0.885 (p = 0.007). Finally, when age, sex, and nodule dimensions were considered, the AUC of the meta-classifier increased to 0.915 (p = 2.3e-4). The meta-classifier including predictions from CNN, PRS, and genetic principal components showed a sensitivity of 0.95, specificity of 0.61, NPV of 0.97, and PPV of 0.5. This performance was superior to that of the clinical Thyroid Imaging Reporting and Data System (TI-RADS) as reported by radiologists in ultrasound reports. Conclusions. For the first time, we showed PRS provides an orthogonal thyroid cancer risk assessment complementary to the ultrasound image-based thyroid nodule risk evaluation. This proof-of-concept study opens an opportunity for developing next-generation schemas for thyroid nodule evaluation incorporating clinical, imaging, and genetic data. Presentation: Sunday, June 18, 2023
Publisher OA PDF DOI
Thyroid cancer polygenic risk score improves classification of thyroid nodules as benign or malignant.
Zenodo (CERN European Organization for Nuclear Research) · 2023-07-26
datasetOpen access
Supplementary Data for the manuscript: Thyroid cancer polygenic risk score improves classification of thyroid nodules as benign or malignant. Nikita Pozdeyev, MD, PhD (ORCiD ID: 0000-0001-8574-1972), Manjiri Dighe, MD, Martin Barrio, MD, MS, Christopher Raeburn, MD, Harry Smith, MS, Matthew Fisher, MS, Sameer Chavan, MS, Nicholas Rafaels, MS, Jonathan A. Shortt, PhD, Meng Lin, PhD, Michael G. Leu, MD, Toshimasa Clark, MD, Carrie Marshall, MD, Bryan R. Haugen, MD, Devika Subramanian, PhD, Regeneron Genetics Center, Kristy Crooks, PhD, Christopher Gignoux, PhD, Trevor Cohen, MBChB, PhD, FACMI
Publisher DOI
A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study (Preprint)
2023-10-04
preprintOpen access1st authorCorresponding
<sec> <title>BACKGROUND</title> Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20% of patients, with an economic cost of $5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D. </sec> <sec> <title>OBJECTIVE</title> We aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data. </sec> <sec> <title>METHODS</title> We conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model’s predictive performance using the area under the receiver operating characteristic curve–weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions. </sec> <sec> <title>RESULTS</title> Our model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P&lt;.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5%, 20%, and 48% risk of postdiagnosis DKA. </sec> <sec> <title>CONCLUSIONS</title> We have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors. </sec>
Publisher DOI
Stratification of Pediatric COVID-19 Cases Using Inflammatory Biomarker Profiling and Machine Learning
Journal of Clinical Medicine · 2023-08-22 · 1 citations
articleOpen access1st authorCorresponding
While pediatric COVID-19 is rarely severe, a small fraction of children infected with SARS-CoV-2 go on to develop multisystem inflammatory syndrome (MIS-C), with substantial morbidity. An objective method with high specificity and high sensitivity to identify current or imminent MIS-C in children infected with SARS-CoV-2 is highly desirable. The aim was to learn about an interpretable novel cytokine/chemokine assay panel providing such an objective classification. This retrospective study was conducted on four groups of pediatric patients seen at multiple sites of Texas Children's Hospital, Houston, TX who consented to provide blood samples to our COVID-19 Biorepository. Standard laboratory markers of inflammation and a novel cytokine/chemokine array were measured in blood samples of all patients. Group 1 consisted of 72 COVID-19, 70 MIS-C and 63 uninfected control patients seen between May 2020 and January 2021 and predominantly infected with pre-alpha variants. Group 2 consisted of 29 COVID-19 and 43 MIS-C patients seen between January and May 2021 infected predominantly with the alpha variant. Group 3 consisted of 30 COVID-19 and 32 MIS-C patients seen between August and October 2021 infected with alpha and/or delta variants. Group 4 consisted of 20 COVID-19 and 46 MIS-C patients seen between October 2021 andJanuary 2022 infected with delta and/or omicron variants. Group 1 was used to train an L1-regularized logistic regression model which was tested using five-fold cross validation, and then separately validated against the remaining naïve groups. The area under receiver operating curve (AUROC) and F1-score were used to quantify the performance of the cytokine/chemokine assay-based classifier. Standard laboratory markers predict MIS-C with a five-fold cross-validated AUROC of 0.86 ± 0.05 and an F1 score of 0.78 ± 0.07, while the cytokine/chemokine panel predicted MIS-C with a five-fold cross-validated AUROC of 0.95 ± 0.02 and an F1 score of 0.91 ± 0.04, with only sixteen of the forty-five cytokines/chemokines sufficient to achieve this performance. Tested on Group 2 the cytokine/chemokine panel yielded AUROC = 0.98 and F1 = 0.93, on Group 3 it yielded AUROC = 0.89 and F1 = 0.89, and on Group 4 AUROC = 0.99 and F1 = 0.97. Adding standard laboratory markers to the cytokine/chemokine panel did not improve performance. A top-10 subset of these 16 cytokines achieves equivalent performance on the validation data sets. Our findings demonstrate that a sixteen-cytokine/chemokine panel as well as the top ten subset provides a highly sensitive, and specific method to identify MIS-C in patients infected with SARS-CoV-2 of all the major variants identified to date.
Publisher OA PDF DOI
Stratification of Pediatric COVID-19 cases by inflammatory biomarker profiling and machine learning
medRxiv · 2023-04-04 · 1 citations
preprintOpen access1st authorCorresponding
An objective method to identify imminent or current Multi-Inflammatory Syndrome in Children (MIS-C) infected with SARS-CoV-2 is highly desirable. The aims was to define an algorithmically interpreted novel cytokine/chemokine assay panel providing such an objective classification. This study was conducted on 4 groups of patients seen at multiple sites of Texas Children's Hospital, Houston, TX who consented to provide blood samples to our COVID-19 Biorepository. Standard laboratory markers of inflammation and a novel cytokine/chemokine array were measured in blood samples of all patients. Group 1 consisted of 72 COVID-19, 66 MIS-C and 63 uninfected control patients seen between May 2020 and January 2021 and predominantly infected with pre-alpha variants. Group 2 consisted of 29 COVID-19 and 43 MIS-C patients seen between January-May 2021 infected predominantly with the alpha variant. Group 3 consisted of 30 COVID-19 and 32 MIS-C patients seen between August-October 2021 infected with alpha and/or delta variants. Group 4 consisted of 20 COVID-19 and 46 MIS-C patients seen between October 2021-January 2022 infected with delta and/or omicron variants. Group 1 was used to train a L1-regularized logistic regression model which was validated using 5-fold cross validation, and then separately validated against the remaining naïve groups. The area under receiver operating curve (AUROC) and F1-score were used to quantify the performance of the algorithmically interpreted cytokine/chemokine assay panel. Standard laboratory markers predict MIS-C with a 5-fold cross-validated AUROC of 0.86 ± 0.05 and an F1 score of 0.78 ± 0.07, while the cytokine/chemokine panel predicted MIS-C with a 5-fold cross-validated AUROC of 0.95 ± 0.02 and an F1 score of 0.91 ± 0.04, with only sixteen of the forty-five cytokines/chemokines sufficient to achieve this performance. Tested on Group 2 the cytokine/chemokine panel yielded AUROC =0.98, F1=0.93, on Group 3 it yielded AUROC=0.89, F1 = 0.89, and on Group 4 AUROC= 0.99, F1= 0.97). Adding standard laboratory markers to the cytokine/chemokine panel did not improve performance. A top-10 subset of these 16 cytokines achieves equivalent performance on the validation data sets. Our findings demonstrate that a sixteen-cytokine/chemokine panel as well as the top ten subset provides a sensitive, specific method to identify MIS-C in patients infected with SARS-CoV-2 of all the major variants identified to date.
Publisher OA PDF DOI
Thyroid cancer polygenic risk score improves classification of thyroid nodules as benign or malignant.
Zenodo (CERN European Organization for Nuclear Research) · 2023-07-26
datasetOpen access
Supplementary Data for the manuscript: Thyroid cancer polygenic risk score improves classification of thyroid nodules as benign or malignant. Nikita Pozdeyev, MD, PhD (ORCiD ID: 0000-0001-8574-1972), Manjiri Dighe, MD, Martin Barrio, MD, MS, Christopher Raeburn, MD, Harry Smith, MS, Matthew Fisher, MS, Sameer Chavan, MS, Nicholas Rafaels, MS, Jonathan A. Shortt, PhD, Meng Lin, PhD, Michael G. Leu, MD, Toshimasa Clark, MD, Carrie Marshall, MD, Bryan R. Haugen, MD, Devika Subramanian, PhD, Regeneron Genetics Center, Kristy Crooks, PhD, Christopher Gignoux, PhD, Trevor Cohen, MBChB, PhD, FACMI
Publisher DOI
Thyroid cancer polygenic risk score combined with deep learning analysis of ultrasound images improves the classification of thyroid nodules as benign or malignant
medRxiv · 2023-04-17 · 1 citations
preprintOpen access
Abstract Evaluating thyroid nodules to rule out malignancy is a very common clinical task. Image-based clinical and machine learning risk stratification schemas rely on the presence of thyroid nodule high-risk sonographic features. However, this approach is less suitable for diagnosing malignant thyroid nodules with a benign appearance on ultrasound. In this study, we developed thyroid cancer polygenic risk scoring (PRS) to complement deep learning analysis of ultrasound images. When the output of the deep learning model was combined with thyroid cancer PRS and genetic ancestry estimates, the area under the receiver operating characteristic curve (AUROC) of the benign vs. malignant thyroid nodule classifier increased from 0.83 to 0.89 (DeLong, p-value = 0.007). The combined deep learning and genetic classifier achieved a clinically relevant sensitivity of 0.95, 95 CI [0.88-0.99], specificity of 0.63 [0.55-0.70], and positive and negative predictive values of 0.47 [0.41-0.58] and 0.97 [0.92-0.99], respectively. An improved AUROC was consistent in ancestry-stratified analysis in Europeans (0.83 and 0.87 for deep-learning and deep learning combined with PRS classifiers, respectively). An elevated PRS was associated with a greater risk of thyroid cancer structural disease recurrence (ordinal logistic regression, p-value = 0.002). This study demonstrates that augmenting ultrasound image analysis with PRS improves diagnostic accuracy, paving the way for developing the next generation of clinical risk stratification algorithms incorporating inherited risk for developing thyroid malignancy.
Publisher OA PDF DOI
Thyroid Cancer Polygenic Risk Score Improves Classification of Thyroid Nodules as Benign or Malignant
The Journal of Clinical Endocrinology & Metabolism · 2023-09-08 · 21 citations
articleOpen access
CONTEXT: Thyroid nodule ultrasound-based risk stratification schemas rely on the presence of high-risk sonographic features. However, some malignant thyroid nodules have benign appearance on thyroid ultrasound. New methods for thyroid nodule risk assessment are needed. OBJECTIVE: We investigated polygenic risk score (PRS) accounting for inherited thyroid cancer risk combined with ultrasound-based analysis for improved thyroid nodule risk assessment. METHODS: The convolutional neural network classifier was trained on thyroid ultrasound still images and cine clips from 621 thyroid nodules. Phenome-wide association study (PheWAS) and PRS PheWAS were used to optimize PRS for distinguishing benign and malignant nodules. PRS was evaluated in 73 346 participants in the Colorado Center for Personalized Medicine Biobank. RESULTS: When the deep learning model output was combined with thyroid cancer PRS and genetic ancestry estimates, the area under the receiver operating characteristic curve (AUROC) of the benign vs malignant thyroid nodule classifier increased from 0.83 to 0.89 (DeLong, P value = .007). The combined deep learning and genetic classifier achieved a clinically relevant sensitivity of 0.95, 95% CI [0.88-0.99], specificity of 0.63 [0.55-0.70], and positive and negative predictive values of 0.47 [0.41-0.58] and 0.97 [0.92-0.99], respectively. AUROC improvement was consistent in European ancestry-stratified analysis (0.83 and 0.87 for deep learning and deep learning combined with PRS classifiers, respectively). Elevated PRS was associated with a greater risk of thyroid cancer structural disease recurrence (ordinal logistic regression, P value = .002). CONCLUSION: Augmenting ultrasound-based risk assessment with PRS improves diagnostic accuracy.
Publisher OA PDF DOI
Improving Pharmacovigilance Signal Detection from Clinical Notes with Locality Sensitive Neural Concept Embeddings.
PubMed · 2022-01-01 · 3 citations
articleOpen access
Although pharmaceutical products undergo clinical trials to profile efficacy and safety, some adverse drug reactions (ADRs) are only discovered after release to market. Post-market drug safety surveillance - pharmacovigilance - leverages information from various sources to proactively identify such ADRs. Clinical notes are one source of observational data that could assist this process, but their inherent complexity can obfuscate possible ADR signals. In previous research, embeddings trained on observational reports have improved detection of such signals over commonly used statistical measures. Moreover, neural embedding methods which further encode juxtapositional information have shown promise on analogical retrieval tasks, suggesting proximity-based alternatives to document-level modeling for signal detection. This work uses natural language processing and locality sensitive neural embeddings to increase ADR signal recovery from clinical notes, with AUCs of ~0.63-0.71. Constituting a ~50% increase over baselines, our method sets the state-of-the-art for these reference standards when solely leveraging clinical notes.
Publisher OA PDF

Recent grants

ITR: Events, Patterns, and Analysis: Forecasting International Conflict in the Twenty-First Century
NSF · $400k · 2002–2008

Frequent coauthors

Venkataraman Subramanian
Leeds Teaching Hospitals NHS Trust
26 shared
Anita Deswal
The University of Texas MD Anderson Cancer Center
25 shared
Douglas L. Mann
Washington University in St. Louis
25 shared
Trevor Cohen
21 shared
Robert M. Stein
Rice University
15 shared
Leonardo Dueñas‐Osorio
15 shared
Justin Mower
Rice University
13 shared
Keith D. Cooper
13 shared

Education

Ph.D., Computer Science
Stanford University
1989
B. Tech., Computer Science and Engineering
Indian Institute of Technology Kharagpur
1982

Awards & honors

Invited Plenary Speaker, Heart Failure Society of America (2…
Microsoft Research ERP Board Member (2006-2008)
Invited Plenary Speaker, International Joint Conference on A…
Invited Speaker, Grace Murray Hopper Conference (2007)
CRA Distinguished Lecturer, University of Washington (2002)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Devika Subramanian

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you