Halil Kilicoglu
· ASSOC PROFVerifiedUniversity of Illinois Urbana-Champaign · Computer Science
Active 2003–2026
About
Halil Kilicoglu is an Associate Professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. His contact information includes a phone number (217-333-0187) and email (halil@illinois.edu). His recent courses include IS 515 (Information Modeling), IS 549 (Practicum), IS 567 (Text Mining), IS 577 (Data Mining), IS 597 (Large Language Models), and NUTR 500 (AI in Nutrition and Health). The page indicates his affiliation with the Siebel School of Computing and Data Science within The Grainger College of Engineering, but does not provide specific details about his research focus, background, or key contributions.
Research topics
- Computer Science
- Data science
- Artificial Intelligence
- Data Mining
- Information Retrieval
- Medicine
- Computer Security
- Genetics
- Engineering
- Theoretical computer science
- Internal medicine
- Computational biology
- Medical physics
- Biology
- Pharmacology
Selected publications
Journal of Nutrition · 2026-03-03
articleOpen accessSenior authorBACKGROUND: Complementary and integrative health (CIH) interventions, including nutritional strategies, are widely used to support whole-person health, yet evidence on their efficacy, safety, and mechanisms remains fragmented. OBJECTIVES: This narrative review mapped existing CIH knowledge resources, identifies critical gaps, and highlighted challenges in interoperability and integration. We proposed artificial intelligence-driven informatics strategies to standardize, connect, and leverage these resources, with the goal of advancing discovery, precision nutrition, and personalized approaches to health and well-being. METHODS: We conducted a narrative review of publicly available knowledge resources on complementary health interventions, focusing on their effectiveness, safety, and biological mechanisms, including the microbiome. Interventions were categorized as nutritional, physical, or psychological. Resources were then classified as knowledge bases, datasets, databases, ontologies, knowledge graphs, platforms, or initiatives, with summaries of their scope, functionality, and contributions. RESULTS: We identified 47 resources that can support complementary and integrative health informatics (15 knowledge bases, 13 databases, 7 datasets, 4 platforms, 3 initiatives, 3 ontologies, and 2 knowledge graphs). Categories included nutritional interventions (32, with 13 on the microbiome), physical interventions (4), psychological interventions (3), and comprehensive or multimodal resources (7). Most resources (39) were publicly available. CONCLUSIONS: Advancing whole-person health requires greater standardization and integration of knowledge resources, which in turn enables more effective application of artificial intelligence and informatics methods. When well-structured, interoperable resources are coupled with these computational methods, they can unify diverse knowledge domains, advance the science of complementary and integrative health, and accelerate discovery in personalized nutrition.
Publication Type Tagging using Transformer Models and Multi-Label Classification
medRxiv · 2025-03-07 · 5 citations
preprintOpen accessAbstract Indexing articles by their publication type and study design is essential for efficient search and filtering of the biomedical literature, but is understudied compared to indexing by MeSH topical terms. In this study, we leveraged the human-curated publication types and study designs in PubMed to generate a dataset of more than 1.2M articles (titles and abstracts) and used state-of-the-art Transformer-based models for automatic tagging of publication types and study designs. Specifically, we trained PubMedBERT-based models using a multi-label classification approach, and explored undersampling, feature verbalization, and contrastive learning to improve model performance. Our results show that PubMedBERT provides a strong baseline for publication type and study design indexing; undersampling, feature verbalization, and unsupervised constrastive loss have a positive impact on performance, whereas supervised contrastive learning degrades the performance. We obtained the best overall performance with 80% undersampling and feature verbalization (0.632 macro-F 1 , 0.969 macro-AUC). The model outperformed previous models (MultiTagger) across all metrics and the performance difference was statistically significant ( p < 0.001). Despite its stronger performance, the model still has room for improvement and future work could explore features based on full-text as well as model interpretability. We make our data and code available at https://github.com/ScienceNLP-Lab/MultiTagger-v2/tree/main/AMIA .
medRxiv · 2025-04-25 · 2 citations
preprintOpen accessAbstract Objective Searching for biomedical articles by publication type or study design is essential for tasks like evidence synthesis. Prior work has relied solely on PubMed information or addressed a limited set of types (e.g., randomized controlled trials). In this study, we build on previous work by lever-aging full-text features, enriched text representations, and advanced optimization techniques for comprehensive indexing. Methods Using a dataset of PubMed articles published between 1987 and 2023 with human-annotated indexing terms, we fine-tuned BERT-based encoders (PubMedBERT, BioLinkBERT, SPECTER, SPECTER2-Base, SPECTER2-Clf) to investigate whether text representations based on different pre-training objectives could benefit the task. We incorporated textual and verbalized metadata features, full-text extraction (rule-based, extractive, and abstractive summarization), and additional topical information about the articles. To mitigate potential label noise and improve calibration, we used asymmetric loss and label smoothing. We also explored contrastive learning approaches (SimCSE, ADNCE, HeroCon, WeighCon). Models were evaluated using precision, recall, F1 score (both micro- and macro-), and area under ROC curve (AUC). Results Fine-tuning SPECTER2-Base with asymmetric loss, label smoothing and contrastive learning (ADNCE and HeroCon) improved performance significantly over the previous best model (micro-F1: 0.658 → 0.670 [+1.8%]; macro-F1: 0.643 → 0.677 [+5.3%]; p < 0.001). Asymmetric loss and using SPECTER2-Base instead of PubMedBERT contributed most to this gain, while contrastive learning provided more moderate gains. Full-text features boosted performance by 2.4% (micro-F1) and 0.8% (macro-F1) over the baseline (micro-F1: 0.656 → 0.672; macro-F1: 0.595 → 0.600; p < 0.001). Conclusion Full-text features, citation-aware encoders, and fine-tuning optimizations significantly improve publication type and study design indexing. Future work should refine label accuracy, better distill relevant full-text information, and expand label sets to meet needs of the research community. Data, code, and models are available at https://github.com/ScienceNLP-Lab/MultiTagger-v2 . Highlights We trained and validated Transformer-based models for automatic indexing of publication types and study designs in biomedical articles, using a dataset with 61 labels derived primarily from expert-assigned PubMed indexing terms. We investigated whether enriched article representations, advanced optimization techniques, and fine-grained labels could enhance model performance. The largest performance improvement came from using citation-aware article representations and asymmetric loss. Models trained using full-text features outperformed models trained using PubMed-only features, demonstrating the utility of full-text content for this task. Graphical Abstract
ArXiv.org · 2025-07-23
preprintOpen accessThe causes of the reproducibility crisis include lack of standardization and transparency in scientific reporting. Checklists such as ARRIVE and CONSORT seek to improve transparency, but they are not always followed by authors and peer review often fails to identify missing items. To address these issues, there are several automated tools that have been designed to check different rigor criteria. We have conducted a broad comparison of 11 automated tools across 9 different rigor criteria from the ScreenIT group. We found some criteria, including detecting open data, where the combination of tools showed a clear winner, a tool which performed much better than other tools. In other cases, including detection of inclusion and exclusion criteria, the combination of tools exceeded the performance of any one tool. We also identified key areas where tool developers should focus their effort to make their tool maximally useful. We conclude with a set of insights and recommendations for stakeholders in the development of rigor and transparency detection tools. The code and data for the study is available at https://github.com/PeterEckmann1/tool-comparison.
Towards Knowledge-Guided Biomedical Lay Summarization using Large Language Models
2025-01-01 · 1 citations
articleOpen accessSenior authorThe massive size, continual growth, and technical jargon in biomedical publications make it difficult for laypeople to stay informed about the latest scientific advances, motivating research on lay summarization of biomedical literature.Large language models (LLMs) are increasingly used for this task.Unlike typical automatic summarization, lay summarization requires incorporating background knowledge not found in a paper and explanations of technical jargon.This study explores the use of MeSH terms (Medical Subject Headings), which represent an article's main topics, to enhance background information generation in biomedical lay summarization.Furthermore, we introduced a multi-turn dialogue approach that more effectively leverages MeSH terms in the instruction-tuning of LLMs to enhance the quality of lay summaries.The best model improved the state-of-the-art on the eLife test set in terms of the ROUGE-1 score by nearly 2%, with competitive scores in other metrics.These results indicate that MeSH terms can guide LLMs to generate more relevant background information for laypeople.Additionally, evaluation on a held-out dataset, one that was not used during model pre-training, shows that this capability generalizes well to unseen data, further demonstrating the effectiveness of our approach.
DiMB-RE: mining the scientific literature for diet-microbiome associations
Journal of the American Medical Informatics Association · 2025-03-27 · 7 citations
articleOpen accessSenior authorOBJECTIVES: To develop a corpus annotated for diet-microbiome associations from the biomedical literature and train natural language processing (NLP) models to identify these associations, thereby improving the understanding of their role in health and disease, and supporting personalized nutrition strategies. MATERIALS AND METHODS: We constructed DiMB-RE, a comprehensive corpus annotated with 15 entity types (eg, Nutrient, Microorganism) and 13 relation types (eg, increases, improves) capturing diet-microbiome associations. We fine-tuned and evaluated state-of-the-art NLP models for named entity, trigger, and relation extraction as well as factuality detection using DiMB-RE. In addition, we benchmarked 2 generative large language models (GPT-4o-mini and GPT-4o) on a subset of the dataset in zero- and one-shot settings. RESULTS: DiMB-RE consists of 14 450 entities and 4206 relationships from 165 publications (including 30 full-text Results sections). Fine-tuned NLP models performed reasonably well for named entity recognition (0.800 F1 score), while end-to-end relation extraction performance was modest (0.445 F1). The use of Results section annotations improved relation extraction. The impact of trigger detection was mixed. Generative models showed lower accuracy compared to fine-tuned models. DISCUSSION: To our knowledge, DiMB-RE is the largest and most diverse corpus focusing on diet-microbiome interactions. Natural language processing models fine-tuned on DiMB-RE exhibit lower performance compared to similar corpora, highlighting the complexity of information extraction in this domain. Misclassified entities, missed triggers, and cross-sentence relations are the major sources of relation extraction errors. CONCLUSION: DiMB-RE can serve as a benchmark corpus for biomedical literature mining. DiMB-RE and the NLP models are available at https://github.com/ScienceNLP-Lab/DiMB-RE.
Investigating Information Propagation in Biomedical Literature through Citations: A Case Study
2025-07-10
articleOpen accessSenior authorScientific growth is iterative, with existing knowledge serving as the foundation for new discoveries. Citations serve as the primary channel for information propagation in science, shaping which ideas and findings persist in the literature and which do not. While natural language processing (NLP) is increasingly used in citation context analysis, it is underutilized in studies that examine the actual scientific content of citations. In this pilot study, we explored how NLP can be used to track the propagation of scientific findings by replicating a prior citation context study that relied on manual extraction. We compared two approaches: a traditional NLP pipeline (named entity recognition and relation extraction) and a generative large language model (LLM). We formulated a two-step automated pipeline: (1) extracting findings from a reference paper and (2) mapping citation contexts to the findings they reference. Our findings indicate that LLMs are superior to traditional NLP techniques in both steps of the pipeline. However, they are also more prone to errors, mapping citation contexts to findings they do not reference. While the two-step automated pipeline was effective, integrating manual annotation of findings with LLM-based mapping of citation contexts yields the best results. To our knowledge, this study is one of the first to explore how NLP, particularly LLMs, can be leveraged to track the flow of information in science. Future research should further evaluate the application of LLMs and other NLP techniques on a larger scale to assess their effectiveness in supporting citation-focused scientometric and informetric studies.
BiomedRAG: A retrieval augmented large language model for biomedicine
Journal of Biomedical Informatics · 2025-01-13 · 43 citations
articleOpen accessRetrieval-augmented generation (RAG) involves a solution by retrieving knowledge from an established database to enhance the performance of large language models (LLM). , these models retrieve information at the sentence or paragraph level, potentially introducing noise and affecting the generation quality. To address these issues, we propose a novel BiomedRAG framework that directly feeds automatically retrieved chunk-based documents into the LLM. Our evaluation of BiomedRAG across four biomedical natural language processing tasks using eight datasets demonstrates that our proposed framework not only improves the performance by 9.95% on average, but also achieves state-of-the-art results, surpassing various baselines by 4.97%. BiomedRAG paves the way for more accurate and adaptable LLM applications in the biomedical domain. • We proposed BiomedRAG , a novel chunk retrieval framework for biomedical NLP tasks. • We used the LLM scores as a supervision signal to train the proposed chunk scorer. • We validated it on four biomedical NLP tasks on eight datasets using various LLMs.
medRxiv · 2025-01-15
preprintOpen accessSenior authorCorrespondingRandomized controlled trials (RCTs) can produce valid estimates of the benefits and harms of therapeutic interventions. However, incomplete reporting can undermine the validity of their conclusions. Reporting guidelines, such as SPIRIT for protocols and CONSORT for results, have been developed to improve transparency in RCT publications. In this study, we report a corpus of 200 RCT publications, named SPIRIT-CONSORT-TM, annotated for transparency. We used a comprehensive data model that includes 83 items from SPIRIT and CONSORT checklists for annotation. Inter-annotator agreement was calculated for 30 pairs. The dataset includes 26,613 sentences annotated with checklist items and 4,231 terms. We also trained natural language processing (NLP) models that automatically identify these items in publications. The sentence classification model achieved 0.742 micro-F1 score (0.865 at the article level). The term extraction model yielded 0.545 and 0.663 micro-F1 score in strict and lenient evaluation, respectively. The corpus serves as a benchmark to train models that assist stakeholders of clinical research in maintaining high reporting standards and synthesizing information on study rigor and conduct.
Scientific Data · 2025-02-27 · 5 citations
articleOpen accessSenior authorRandomized controlled trials (RCTs) can produce valid estimates of the benefits and harms of therapeutic interventions. However, incomplete reporting can undermine the validity of their conclusions. Reporting guidelines, such as SPIRIT for protocols and CONSORT for results, have been developed to improve transparency in RCT publications. In this study, we report a corpus of 200 RCT publications, named SPIRIT-CONSORT-TM, annotated for transparency. We used a comprehensive data model that includes 83 items from SPIRIT and CONSORT checklists for annotation. Inter-annotator agreement was calculated for 30 pairs. The dataset includes 26,613 sentences annotated with checklist items and 4,231 terms. We also trained natural language processing (NLP) models that automatically identify these items in publications. The sentence classification model achieved 0.742 micro-F1 score (0.865 at the article level). The term extraction model yielded 0.545 and 0.663 micro-F1 score in strict and lenient evaluation, respectively. The corpus serves as a benchmark to train models that assist stakeholders of clinical research in maintaining high reporting standards and synthesizing information on study rigor and conduct.
Recent grants
Frequent coauthors
- 64 shared
Marcelo Fiszman
Pontifical Catholic University of Rio de Janeiro
- 60 shared
Thomas C. Rindflesch
- 45 shared
Graciela Rosemblat
National Institutes of Health
- 34 shared
Dina Demner‐Fushman
United States National Library of Medicine
- 20 shared
Gerben ter Riet
Amsterdam University Medical Centers
- 14 shared
Michael J. Cairelli
Kaiser Permanente
- 14 shared
Rui Zhang
- 13 shared
Dongwook Shin
Samsung Medical Center
Education
- 2005
Ph.D., Computer Science
University of Illinois at Urbana-Champaign
- 2001
M.S., Computer Science
University of Illinois at Urbana-Champaign
- 1998
B.S., Computer Engineering
Middle East Technical University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Halil Kilicoglu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup