
Jaideep Srivastava
VerifiedUniversity of Minnesota · Computer Science and Engineering
Active 1986–2026
About
Jaideep Srivastava is a professor in the Department of Computer Science & Engineering at the University of Minnesota Twin Cities, where he also serves as the Director of Undergraduate Studies for Data Science. He joined the department in 1988 after completing his Ph.D. in Computer Science at the University of California, Berkeley. His educational background includes a B.S. in Computer Science from the Indian Institute of Technology Kanpur and an M.S. and Ph.D. in Computer Science from UC Berkeley. Srivastava's research interests encompass databases, data mining, and multimedia systems. He has made significant contributions to the fields of data science and machine learning, with a focus on developing predictive models and analyzing large-scale social networks. He is the co-founder and president of Ninja Metrics, Inc., a startup specializing in analytics for social systems and social media. His professional experience includes roles such as Research Director and Chief Scientist at Qatar Computing Research Institute, as well as industry positions including Chief Technology Officer for Persistent Systems, Director of Data Analytics for Yodlee, Inc., and data mining architect for Amazon. Srivastava has been recognized with awards such as the IEEE Fellowship and the PAKDD Distinguished Contributions Award, and he has been involved in numerous research projects related to health, social influence, and large-scale social network analysis.
Research topics
- Political Science
- Computer Science
- Computer Security
- Business
- Medicine
- Psychiatry
- Engineering
- Human–computer interaction
- Internet privacy
- Physical therapy
- Law
- Internal medicine
- Commerce
- Pediatrics
- Marketing
- Advertising
Selected publications
Automated Auditing of Hospital Discharge Summaries for Care Transitions
ArXiv.org · 2026-04-07
articleOpen accessSenior authorIncomplete or inconsistent discharge documentation is a primary driver of care fragmentation and avoidable readmissions. Despite its critical role in patient safety, auditing discharge summaries relies heavily on manual review and is difficult to scale. We propose an automated framework for large-scale auditing of discharge summaries using locally deployed Large Language Models (LLMs). Our approach operationalizes core transition-of-care requirements such as follow-up instructions, medication history and changes, patient information and clinical course, etc. into a structured validation checklist of questions based on DISCHARGED framework. Using adult inpatient summaries from the MIMIC-IV database, we utilize a privacy-preserving LLM to identify the presence, absence, or ambiguity of key documentation elements. This work demonstrates the feasibility of scalable, automated clinical auditing and provides a foundation for systematic quality improvement in electronic health record documentation.
ArXiv.org · 2026-04-07
articleOpen accessSenior authorIncorrect information poses significant challenges by disrupting content veracity and integrity, yet most detection approaches struggle to jointly balance textual content verification with external knowledge modification under collapsed attention geometries. To address this issue, we propose a dual-head reasoning framework, BiMind, which disentangles content-internal reasoning from knowledge-augmented reasoning. In BiMind, we introduce three core innovations: (i) an attention geometry adapter that reshapes attention logits via token-conditioned offsets and mitigates attention collapse; (ii) a self-retrieval knowledge mechanism, which constructs an in-domain semantic memory through kNN retrieval and injects retrieved neighbors via feature-wise linear modulation; (iii) the uncertainty-aware fusion strategies, including entropy-gated fusion and a trainable agreement head, stabilized by a symmetric Kullback-Leibler agreement regularizer. To quantify the knowledge contributions, we define a novel metric, Value-of-eXperience (VoX), to measure instance-wise logit gains from knowledge-augmented reasoning. Experiment results on public datasets demonstrate that our BiMind model outperforms advanced detection approaches and provides interpretable diagnostics on when and why knowledge matters.
arXiv (Cornell University) · 2026-04-07
preprintOpen accessSenior authorIncorrect information poses significant challenges by disrupting content veracity and integrity, yet most detection approaches struggle to jointly balance textual content verification with external knowledge modification under collapsed attention geometries. To address this issue, we propose a dual-head reasoning framework, BiMind, which disentangles content-internal reasoning from knowledge-augmented reasoning. In BiMind, we introduce three core innovations: (i) an attention geometry adapter that reshapes attention logits via token-conditioned offsets and mitigates attention collapse; (ii) a self-retrieval knowledge mechanism, which constructs an in-domain semantic memory through kNN retrieval and injects retrieved neighbors via feature-wise linear modulation; (iii) the uncertainty-aware fusion strategies, including entropy-gated fusion and a trainable agreement head, stabilized by a symmetric Kullback-Leibler agreement regularizer. To quantify the knowledge contributions, we define a novel metric, Value-of-eXperience (VoX), to measure instance-wise logit gains from knowledge-augmented reasoning. Experiment results on public datasets demonstrate that our BiMind model outperforms advanced detection approaches and provides interpretable diagnostics on when and why knowledge matters.
Automated Auditing of Hospital Discharge Summaries for Care Transitions
arXiv (Cornell University) · 2026-04-07
preprintOpen accessSenior authorIncomplete or inconsistent discharge documentation is a primary driver of care fragmentation and avoidable readmissions. Despite its critical role in patient safety, auditing discharge summaries relies heavily on manual review and is difficult to scale. We propose an automated framework for large-scale auditing of discharge summaries using locally deployed Large Language Models (LLMs). Our approach operationalizes core transition-of-care requirements such as follow-up instructions, medication history and changes, patient information and clinical course, etc. into a structured validation checklist of questions based on DISCHARGED framework. Using adult inpatient summaries from the MIMIC-IV database, we utilize a privacy-preserving LLM to identify the presence, absence, or ambiguity of key documentation elements. This work demonstrates the feasibility of scalable, automated clinical auditing and provides a foundation for systematic quality improvement in electronic health record documentation.
Evaluating Reasoning-Based Scaffolds for Human-AI Co-Annotation: The ReasonAlign Annotation Protocol
arXiv (Cornell University) · 2026-03-22
articleOpen accessSenior authorHuman annotation is central to NLP evaluation, yet subjective tasks often exhibit substantial variability across annotators. While large language models (LLMs) can provide structured reasoning to support annotation, their influence on human annotation behavior remains unclear. We introduce ReasonAlign, a reasoning-based annotation scaffold that exposes LLM-generated explanations while withholding predicted labels. We frame this as a controlled study of how reasoning affects human annotation behavior, rather than a full evaluation of annotation accuracy. Using a two-pass protocol inspired by Delphi-style revision, annotators first label instances independently and then revise their decisions after viewing model-generated reasoning. We evaluate the approach on sentiment classification and opinion detection tasks, analyzing changes in inter-annotator agreement and revision behavior. To quantify these effects, we introduce the Annotator Effort Proxy (AEP), a metric capturing the proportion of labels revised after exposure to reasoning. Our results show that exposure to reasoning is associated with increased agreement alongside minimal revision, suggesting that reasoning primarily helps resolve ambiguous cases without inducing widespread changes. These findings provide insight into how reasoning explanations shape annotation consistency and highlight reasoning-based scaffolds as a practical mechanism for supporting human-AI annotation workflows.
Data Skeleton Learning: Scalable active clustering with sparse graph structures
Pattern Recognition · 2025-12-29
articleSenior authorThe Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection
2025-01-01
articleOpen accessArghodeep Nandi, Megha Sundriyal, Euna Mehnaz Khan, Jikai Sun, Emily K. Vraga, Jaideep Srivastava, Tanmoy Chakraborty. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
Diagnostics · 2025-03-13 · 5 citations
articleOpen accessSenior authorBackground: Heart failure with reduced ejection fraction is a complex condition that necessitates adaptive, patient-specific management strategies. This study aimed to evaluate the effectiveness of a time-adaptive machine learning model, the Passive-Aggressive classifier, in predicting heart failure with reduced ejection fraction severity and capturing individualized disease progression. Methods: A time-adaptive Passive-Aggressive classifier was employed, using clinical data and Brain Natriuretic Peptide levels as class designators for heart failure with reduced ejection severity. The model was personalized for individual patients by sequentially incorporating clinical visit data from 0–9 visits. The model’s adaptability and effectiveness in capturing individual health trajectories were assessed using accuracy and reliability metrics as more data were added. Results: With the progressive introduction of patient-specific data, the model demonstrated significant improvements in predictive capabilities. By incorporating data from nine visits, significant gains in accuracy and reliability were achieved, with the One-Versus-Rest AUC increasing from 0.4884 with no personalization (zero visits) to 0.8253 (nine visits). This demonstrates the model’s ability to handle diverse patient presentations and the dynamic nature of disease progression. Conclusions: The findings show the potential of time-adaptive machine learning models, particularly the Passive-Aggressive classifier, in managing heart failure with reduced ejection fraction and other chronic diseases. By enabling precise, patient-specific predictions, these approaches support early detection, tailored interventions, and improved long-term outcomes. This study highlights the feasibility of integrating adaptive models into clinical workflows to enhance the management of heart failure with reduced ejection fraction and similar chronic conditions.
Computational and Structural Biotechnology Journal · 2025-01-01 · 5 citations
reviewOpen accessBackground: Obstructive sleep apnea (OSA) is a prevalent and potentially severe sleep disorder characterized by repeated interruptions in breathing during sleep. Machine learning models have been increasingly applied in various aspects of OSA research, including diagnosis, treatment optimization, and developing biomarkers for endotypes and disease mechanisms. Objective: This narrative review evaluates the application of machine learning in OSA research, focusing on model performance, dataset characteristics, demographic representation, and validation strategies. We aim to identify trends and gaps to guide future research and improve clinical decision-making that leverages machine learning. Methods: This narrative review examines data extracted from 254 scientific publications published in the PubMed database between January 2018 and March 2023. Studies were categorized by machine learning applications, models, tasks, validation metrics, data sources, and demographics. Results: Our analysis revealed that most machine learning applications focused on OSA classification and diagnosis, utilizing various data sources such as polysomnography, electrocardiogram data, and wearable devices. We also found that study cohorts were predominantly overweight males, with an underrepresentation of women, younger obese adults, individuals over 60 years old, and diverse racial groups. Many studies had small sample sizes and limited use of robust model validation. Conclusion: Our findings highlight the need for more inclusive research approaches, starting with adequate data collection in terms of sample size and bias mitigation for better generalizability of machine learning models in OSA research. Addressing these demographic gaps and methodological opportunities is critical for ensuring more robust and equitable applications of artificial intelligence in healthcare.
medRxiv · 2025-03-01 · 3 citations
reviewOpen accessBackground: Obstructive sleep apnea (OSA) is a prevalent and potentially severe sleep disorder characterized by repeated interruptions in breathing during sleep. Machine learning models have been increasingly applied in various aspects of OSA research, including diagnosis, treatment optimization, and developing biomarkers for endotypes and disease mechanisms. Objective: This narrative review evaluates the application of machine learning in OSA research, focusing on model performance, dataset characteristics, demographic representation, and validation strategies. We aim to identify trends and gaps to guide future research and improve clinical decision-making that leverages machine learning. Methods: This narrative review examines data extracted from 254 scientific publications published in the PubMed database between January 2018 and March 2023. Studies were categorized by machine learning applications, models, tasks, validation metrics, data sources, and demographics. Results: Our analysis revealed that most machine learning applications focused on OSA classification and diagnosis, utilizing various data sources such as polysomnography, electrocardiogram data, and wearable devices. We also found that study cohorts were predominantly overweight males, with an underrepresentation of women, younger obese adults, individuals over 60 years old, and diverse racial groups. Many studies had small sample sizes and limited use of robust model validation. Conclusion: Our findings highlight the need for more inclusive research approaches, starting with adequate data collection in terms of sample size and bias mitigation for better generalizability of machine learning models in OSA research. Addressing these demographic gaps and methodological opportunities is critical for ensuring more robust and equitable applications of artificial intelligence in healthcare.
Recent grants
Frequent coauthors
- 672 shared
Ee‐Peng Lim
Singapore Management University
- 651 shared
P. Krishna Reddy
Martin College
- 650 shared
Hiroshi Motoda
Osaka University
- 650 shared
Kyu-Young Whang
Thammasat University
- 649 shared
Graham Williams
Australian National University
- 649 shared
Jian Pei
Duke University
- 648 shared
Huan Liu
- 648 shared
Enhong Chen
University of Science and Technology of China
Labs
Jaideep SrivastavaPI
Education
- 1988
PhD, Electrical Engineering and Computer Science
University of California Berkeley
- 1983
B Tech, Computer Science
Indian Institute of Technology Kanpur
Awards & honors
- 2013: PAKDD Distinguished Contributions Award
- 2004: IEEE Fellow
- 2002: IBM Faculty Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jaideep Srivastava
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup