Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Robert Grossman

Robert Grossman

· Professor of Computer ScienceVerified

University of Chicago · Computer Science

Active 1978–2025

h-index67
Citations26.2k
Papers447125 last 5y
Funding$111.3M
See your match with Robert Grossman — sign in to PhdFit.Sign in

About

Robert Grossman is a faculty member at the University of Chicago's Department of Computer Science, holding the title of Frederick H. Rawson Distinguished Service Professor in Medicine and Computer Science. His research focuses on impactful and award-winning computer science topics, contributing to the development of the field through both research and experiential education. His work spans a broad range of areas within computer science, emphasizing interdisciplinary applications and advancing foundational knowledge in the discipline.

Research topics

  • Biology
  • Genetics
  • Computer Science
  • Computational biology
  • Data Mining
  • Evolutionary biology
  • Artificial Intelligence
  • Information Retrieval
  • Machine Learning
  • Computer Security
  • Medicine
  • Political Science
  • Database
  • Virology
  • Internal medicine
  • Data science
  • Mathematics
  • World Wide Web
  • Cancer research
  • Pathology
  • Knowledge management
  • Business
  • Statistics

Selected publications

  • A Proposed End-To-End Principle for Data Commons

    ArXiv.org · 2025-02-17

    preprintOpen access1st authorCorresponding

    A data commons brings together (or co-locates) data with cloud computing infrastructure and commonly used software services, tools and applications for managing, analyzing and sharing data to create an interoperable resource for a research community. We introduce an architectural design principle for data commons called the narrow middle architecture that is broadly based upon the end-to-end argument in systems design. We also discuss important core services for data commons and the role of standards.

  • Table 2 from AI-assisted Diagnosis of Nonmelanoma Skin Cancer in Resource-Limited Settings

    2025-07-01

    preprintOpen access

    <p>Dataset characteristics broken down by NMSC subtype at the individual patient level and the biopsy level.</p>

  • Query Augmented Generation (QAG) from the Genomic Data Commons for Accurate Variant Statistics

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-07

    preprintOpen accessSenior author

    Abstract In precision oncology, researchers often use public knowledgebases to check somatic variant frequencies against their cohort data. Large language models (LLMs) can quickly answer questions on somatic variant frequencies, but often hallucinate and give inaccurate results for factual data. Using synthetic queries, we show that somatic variant frequencies in baseline LLM responses are underestimated compared to the Genomic Data Commons (GDC), the world’s largest data commons for cancer research. We present a modular architecture called Query Augmented Generation (QAG) for integrating LLMs with high-quality data from a third party data source such as a data commons, knowledgebase or database. We apply QAG to the GDC to help researchers obtain accurate frequencies for somatic variants, copy number variants, and MSI status—even for complex queries requiring multiple steps in the GDC portal and API. Our software is deployed as a model context protocol (MCP) server on Hugging Face and available on GitHub.

  • Building digital histology models of transcriptional tumor programs with generative deep learning for pathology-based precision medicine

    Genome Medicine · 2025-08-07 · 2 citations

    articleOpen access

    BACKGROUND: Precision oncology depends on identifying the biological vulnerabilities of a tumor. Molecular assays, like transcriptomics, provide an information-rich view of the tumor that can be leveraged to inform therapeutic selection. However, the costs of such assays can be prohibitive for clinical translation at scale. Histology-based imaging remains a predominant means of diagnosis that is widely accessible. To more broadly leverage limited molecular datasets, models have been trained to use histology to infer the expression of individual genes or pathways, with varying levels of accuracy and explainability. METHODS: Our approach detects expression of transcriptional programs from tumor histology and interprets the image features supporting program detection. Specifically, we used RNA-seq data from squamous cell carcinoma (SCC) patients to infer cohesive expression patterns of multiple genes. Then, we used deep learning techniques to train a computational model to predict the activity levels of the transcriptional programs directly from histology images. We exploited that predictive capability to generate synthetic digital models of the cellular histology of each transcriptional program, using generative adversarial networks to isolate image features supporting specific transcriptional predictions and pathologist review to interpret the images. RESULTS: Applying our histologically integrated latent space analysis to SCCs revealed sets of genes associated with both pathologist-interpretable image features and clinically relevant processes, including immune response, collagen remodeling, and fibrosis, going beyond predictions of individual molecular features. CONCLUSIONS: Our results demonstrate an approach for discovering clinically interpretable histological features that indicate molecular, potentially treatment-informing, biological processes. These features are detectable in widely available histology slides, allowing a standard microscope to deliver complex, patient-specific molecular information.

  • Demonstration of Interoperability Between MIDRC and N3C: A COVID-19 Severity Prediction Use Case

    Journal of Imaging Informatics in Medicine · 2025-08-14

    articleOpen access

    Interoperability between data sources, one of the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management, can enable multi-modality research. The purpose of our study was to investigate the potential for interoperability between an imaging resource, the Medical Imaging and Data Resource Center (MIDRC), and a clinical record resource, the National COVID Cohort Collaborative (N3C). The use case was the prediction of COVID-19 severity, defined as evidence for invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice in the N3C clinical record. Patient-level matching between MIDRC and N3C was identified using Privacy Preserving Record Linking via an honest broker. We identified positive COVID-19 tests and chest radiograph procedures in N3C and used the interval between them to identify images with matching intervals in MIDRC. Of the 236 patients (306 unique images) meeting initial inclusion criteria in MIDRC, 117 patients (and 139 unique images) remained after date interval matching between repositories and exclusion of patients with multiple potential matches. The Charlson Comorbidity Index (CCI) and the minimum mean arterial pressure (MAP) on the day of the chest radiograph were used as clinical indicators. The AUC in the task of predicting severe COVID-19 was evaluated using the computer-extracted imaging index alone (MIDRC), clinical indicators alone (N3C), and both together. Our model combining imaging and clinical indicators (CCI over 2 and MAP below 70) to predict severe COVID had an AUC of 0.73 (95% CI 0.62-0.84), and the models including imaging or clinical indicators alone were 0.67 (95% CI 0.56-0.79) and 0.69 (95% CI 0.59-0.80), respectively. This study highlights the potential for cross-platform data sharing to facilitate future multi-modality research and broader collaborative studies.

  • Data from AI-assisted Diagnosis of Nonmelanoma Skin Cancer in Resource-Limited Settings

    2025-07-01

    preprintOpen access

    <div>AbstractBackground:<p>Early and precise diagnosis is vital to improving patient outcomes and reducing morbidity. In resource-limited settings, cancer diagnosis is often challenging due to shortages of expert pathologists. We assess the effectiveness of general-purpose pathology foundation models (FM) for the diagnosis and annotation of nonmelanoma skin cancer (NMSC) in resource-limited settings.</p>Methods:<p>We evaluated three pathology FMs (UNI, PRISM, and Prov-GigaPath) using deidentified NMSC histology images from the Bangladesh Vitamin E and Selenium Trial to predict cancer subtype based on zero-shot whole-slide embeddings. In addition, we evaluated tile aggregation methods and machine learning models for prediction. Lastly, we employed few-shot learning of PRISM tile embeddings to perform whole-slide annotation.</p>Results:<p>We found that the best model used PRISM’s aggregated tile embeddings to train a multilayer perceptron model to predict NMSC subtype [mean area under the receiver operating characteristic curve (AUROC) = 0.925, <i>P</i> < 0.001]. Within the other FMs, we found that using attention-based multi-instance learning to aggregate tile embeddings to train a multilayer perceptron model was optimal (UNI: mean AUROC = 0.913, <i>P</i> < 0.001; Prov-GigaPath: mean AUROC = 0.908, <i>P</i> < 0.001). We finally exemplify the utility of few-shot annotation in computation- and expertise-limited settings.</p>Conclusions:<p>Our study highlights the important role FMs may play in confronting public health challenges and exhibits a real-world potential for machine learning–aided cancer diagnosis.</p>Impact:<p>Pathology FMs offer a promising pathway to improve early and precise NMSC diagnosis, especially in resource-limited environments. These tools could also facilitate patient stratification and recruitment for prospective clinical trials aimed at improving NMSC management.</p></div>

  • Abstract 1141: Improved diagnosis of non-melanoma skin cancer in resource-limited settings

    Cancer Research · 2025-05-22

    article

    Abstract Background: In resource-limited settings, cancer diagnosis is often challenging due to a shortage of expert pathologists. Early and precise diagnosis is vital to enhancing treatment outcomes and reducing morbidity of patients. This issue is particularly prevalent in regions like Bangladesh, where high levels of arsenic exposure increase the risk of non-melanoma skin cancer (NMSC). Here, we assess the effectiveness of general-purpose pathology foundation models (FMs) for diagnosis of NMSC. In particular, we aim to determine if FMs have value in resource-constrained environments, such that encodings from scanned whole slide images can be used with simple classification models that can run efficiently in resource-limited settings. Method: Using 5-fold cross validation, we evaluated three pathology foundation models (UNI, PRISM, and Prov-GigaPath) as well as a ResNet18 baseline model using de-identified NMSC data from the Bangladesh Vitamin E and Selenium Trial (BEST) by the Institute for Population and Precision Health at the University of Chicago. This data contained 2,130 hematoxylin and eosin (H&E)-stained whole slide images from 553 suspected NMSC biopsy samples from 455 participants. The slides included normal tissue (n=706), Bowen’s disease (n=638), basal cell carcinoma (n=575), and invasive squamous cell carcinoma (n=211). In addition to comparing the three FMs for generating tile embeddings in a zero-shot framework, we also evaluated tile aggregation methods, including global average pooling (GAP), attention-based multi-instance learning (ABMIL), and methods specific to respective FMs. Lastly, we compared the performance of logistic regression, XGBoost, and shallow multilayer perceptron neural networks (MLP) for the prediction of cancer subtype from the slide embedding. Results: All three FMs significantly outperformed ResNet18 (mean AUROC=0.805; p<0.001). We found that the overall best model used the PRISM tile embeddings aggregated using PRISM’s intrinsic Perceiver network to train an MLP model to predict NMSC subtype (mean AUROC=0.925; p<0.001). Within the other FMs specifically, we found that using ABMIL to aggregate tile embeddings to train an MLP model was optimal for both UNI (mean AUROC=0.913; p<0.001) and Prov-GigaPath (mean AUROC=0.908, p<0.001). We also found the simplest method with logistic regression of GAP aggregated embeddings was able to attain reasonable results for PRISM (mean AUROC=0.882), UNI (mean AUROC=0.865), and Prov-GigaPath (mean AUROC=0.855). Conclusion: Our study highlights the importance of innovation in confronting public health challenges and exhibits a real-world potential for machine learning aided cancer diagnosis. We demonstrate how leveraging whole slide embeddings from pre-trained foundation models can provide considerable potential for the improvement of treatment outcomes and patient survival rates, especially in resource-limited settings. Citation Format: Spencer Ellis, Steven Song, Derek Reiman, Xuan Hui, Renyu Zhang, Mohammad H. Shahriar, Mohammed Kamal, Christopher R. Shea, Robert L. Grossman, Aly A. Khan, Habibul Ahsan. Improved diagnosis of non-melanoma skin cancer in resource-limited settings [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 1141.

  • Multimodal data curation via interoperability: use cases with the Medical Imaging and Data Resource Center

    Scientific Data · 2025-08-01 · 3 citations

    articleOpen access

    Interoperability (the ability of data or tools from non-cooperating resources to integrate or work together with minimal effort) is particularly important for curation of multimodal datasets from multiple data sources. The Medical Imaging and Data Resource Center (MIDRC), a multi-institutional collaborative initiative to collect, curate, and share medical imaging datasets, has made interoperability with other data commons one of its top priorities. The purpose of this study was to demonstrate the interoperability between MIDRC and two other data repositories, BioData Catalyst (BDC) and National Clinical Cohort Collaborative (N3C). Using interoperability capabilities of the data repositories, we built two cohorts for example use cases, with each containing clinical and imaging data on matched patients. The representativeness of the cohorts is characterized by comparing with CDC population statistics using the Jensen-Shannon distance. The process and methods of interoperability demonstrated in this work can be utilized by MIDRC, BDC, and N3C users to create multimodal datasets for development of artificial intelligence/machine learning models.

  • Recommended Clinical Context and Patient Context Data Elements for Liquid Biopsy Data Submitted to Data Repositories and Data Commons

    Clinical and Translational Science · 2025-04-01 · 2 citations

    articleOpen access1st author

    In 2020, BLOODPAC recommended 11 pre-analytical minimal technical data elements for collection and submission of liquid biopsy data to public databases. This article expands on that work by recommending 22 clinical context and 10 patient context data elements. These elements, essential for liquid biopsy data submitted to repositories like the BLOODPAC Data Commons, cover tumor characteristics, disease progression, and patient demographics, supporting biomarker validation, research, and clinical trials.

  • Table 1 from AI-assisted Diagnosis of Nonmelanoma Skin Cancer in Resource-Limited Settings

    2025-07-01

    preprintOpen access

    <p>Important concepts and terms used in this study.</p>

Recent grants

Frequent coauthors

  • Rory Johnson

    University Hospital of Bern

    98 shared
  • Roland Eils

    95 shared
  • Thomas J. Mitchell

    Wellcome Sanger Institute

    85 shared
  • Lars Feuerbach

    German Cancer Research Center

    78 shared
  • L. Sylvia

    Mirai Hospital

    76 shared
  • Keiran Raine

    Wellcome Sanger Institute

    75 shared
  • Geoff Macintyre

    Spanish National Cancer Research Centre

    74 shared
  • Allison P. Heath

    Children's Hospital of Philadelphia

    73 shared

Labs

Education

  • Ph.D., Computer Science

    University of Chicago

    1985
  • M.S., Computer Science

    University of Chicago

    1981
  • B.S., Mathematics

    University of Chicago

    1977

Awards & honors

  • Nick Feamster Receives 2026 Quantrell Teaching Award
  • Brennan Schaffner Receives ACM SIGCHI Special Recognition Aw…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Robert Grossman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup