Rajesh Ranganath

New York University · Computer Science

Active 2008–2026

h-index41

Citations9.0k

Papers229127 last 5y

Funding$3.9M2 active

Faculty page Lab page

See your match with Rajesh Ranganath — sign in to PhdFit.Sign in

About

Rajesh Ranganath is an Assistant Professor at the Courant Institute at NYU in Computer Science and at the Center for Data Science. He is also part of the CILVR group. His research interests include causal, statistical, and probabilistic inference, out-of-distribution detection and generalization, deep generative modeling, interpretability, and machine learning for healthcare. Before joining NYU, he earned degrees in computer science, completing his PhD at Princeton University working with Dave Blei, and his undergraduate studies at Stanford University. He has also spent time as a research affiliate at MIT’s Institute for Medical Engineering and Science.

Research topics

Computer Science
Data science
Machine Learning
Political Science
Artificial Intelligence
Medicine
Public relations
Knowledge management
Mathematics
Pathology

Selected publications

Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health
arXiv (Cornell University) · 2026-05-20
preprintOpen access
Objective: The growing availability of large-scale observational clinical datasets and challenges in conducting randomized controlled trials have spurred enthusiasm in using causal machine learning (ML) for causal inference in observational data. We present a roadmap for applying causal ML to observational data. Materials and methods: We outline the importance of assessing validity assumptions within available data and applying causal ML responsibly for clinical experts using causal ML and ML practitioners with limited clinical expertise. Observations: Despite advances in causal ML, its limitations remain largely under-appreciated across disciplines. This gap in shared knowledge may impact the validity of findings. Discussion: Causal assumptions must be satisfied and modeling choices justified. Otherwise, these approaches risk producing biased or misleading results, with consequences for clinical research and patient care. Conclusion: Causal ML can be a powerful tool for generating causal hypotheses. We provide a template to strengthen the rigor and interpretability of causal analyses.
Publisher DOI
IP04-25 KTGROW-ML: A NOVEL, MACHINE-LEARNING-BASED, MODEL TO PREDICT THE FUTURE GROWTH OF KIDNEY TUMORS
The Journal of Urology · 2026-04-27
article
Publisher DOI
Development and Deployment of a Machine Learning Model to Triage the Use of Prostate <scp>MRI</scp> ( <scp>ProMT</scp> ‐ <scp>ML</scp> ) in Patients With Suspected Prostate Cancer
Journal of Magnetic Resonance Imaging · 2025-11-04 · 3 citations
articleOpen access
BACKGROUND: Access to prostate MRI remains limited due to resource constraints and the need for expert interpretation. PURPOSE: To develop machine learning (ML) models that enable risk-based triage for prostate MRI (ProMT-ML) in the evaluation of prostate cancer. STUDY TYPE: Retrospective and prospective. POPULATION: A total of 11,879 retrospective MRI scans for suspected prostate cancer from a multi-hospital health system, divided into training (N = 9504) and test (N = 2375) sets. A total of 4551 records for prospective validation. FIELD STRENGTH/SEQUENCE: 1.5T and 3T/Turbo-spin echo T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and dynamic contrast-enhanced (DCE). ASSESSMENT: Prostate Imaging Reporting and Data System (PI-RADS) scores were retrieved from MRI reports. The Boruta algorithm was used to select final input features from candidate features. Two models were developed using supervised ML to estimate the likelihood of an abnormal MRI, defined as PI-RADS ≥ 3: Model A (with prostate volume) and Model B (without prostate volume). Models were compared to PSA. Prostate biopsy pathology was assessed to evaluate potential clinical impact. STATISTICAL TESTS: Area under the receiver operating characteristic curve (AUC) was the primary performance metric. RESULTS: A total of 5580 (46.9%) subjects had a PI-RADS score ≥ 3. After feature selection, Model A included age, PSA, body mass index, and prostate volume, while Model B included age, PSA, body mass index, and systolic blood pressure. Both models A (AUC 0.711) and B (AUC 0.616) significantly outperformed PSA (AUC 0.593). Compared to PSA threshold > 4 ng/mL, Model A demonstrated significantly improved specificity (28.3% vs. 21.9%) and no significant difference in sensitivity (89.0% vs. 86.7%). Among false negatives (Model A: 8.0% (62/776); Model B: 16.8% (130/776)), most (Model A: 87%; Model B: 69%) had benign or clinically insignificant disease on biopsy. On prospective validation, both versions of ProMT-ML significantly outperformed PSA. DATA CONCLUSION: ProMT-ML provides personalized risk estimates of abnormal prostate MRI and can support triage of this test. TECHNICAL EFFICACY: Stage 4.
Publisher OA PDF DOI
When accurate prediction models yield harmful self-fulfilling prophecies
Patterns · 2025-04-01 · 11 citations
articleOpen access
Prediction models are popular in medical research and practice. Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions, and they are frequently lauded as instruments for personalized, data-driven healthcare. We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients, but the worse outcome of these patients does not diminish the discrimination of the model. Our main result is a formal characterization of a set of such prediction models. Next, we show that models that are well calibrated before and after deployment are useless for decision-making, as they make no change in the data distribution. These results call for a reconsideration of standard practices for validation and deployment of prediction models that are used in medical decisions.
Publisher DOI
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
ArXiv.org · 2025-03-20
preprintOpen accessSenior author
Problems in fields such as healthcare, robotics, and finance requires reasoning about the value both of what decision or action to take and when to take it. The prevailing hope is that artificial intelligence will support such decisions by estimating the causal effect of policies such as how to treat patients or how to allocate resources over time. However, existing methods for estimating the effect of a policy struggle with \emph{irregular time}. They either discretize time, or disregard the effect of timing policies. We present a new deep-Q algorithm that estimates the effect of both when and what to do called Earliest Disagreement Q-Evaluation (EDQ). EDQ makes use of recursion for the Q-function that is compatible with flexible sequence models, such as transformers. EDQ provides accurate estimates under standard assumptions. We validate the approach through experiments on survival time and tumor growth tasks.
Publisher OA PDF DOI
Three Forms of Stochastic Injection for Improved Distribution-to-Distribution Generative Modeling
ArXiv.org · 2025-10-08
preprintOpen access
Modeling transformations between arbitrary data distributions is a fundamental scientific challenge, arising in applications like drug discovery and evolutionary simulation. While flow matching offers a natural framework for this task, its use has thus far primarily focused on the noise-to-data setting, while its application in the general distribution-to-distribution setting is underexplored. We find that in the latter case, where the source is also a data distribution to be learned from limited samples, standard flow matching fails due to sparse supervision. To address this, we propose a simple and computationally efficient method that injects stochasticity into the training process by perturbing source samples and flow interpolants. On five diverse imaging tasks spanning biology, radiology, and astronomy, our method significantly improves generation quality, outperforming existing baselines by an average of 9 FID points. Our approach also reduces the transport cost between input and generated samples to better highlight the true effect of the transformation, making flow matching a more practical tool for simulating the diverse distribution transformations that arise in science.
Publisher OA PDF DOI
A General Framework for Inference-time Scaling and Steering of Diffusion Models
arXiv (Cornell University) · 2025-01-12 · 2 citations
preprintOpen accessSenior author
Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we present Feynman-Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models - even with off-the-shelf rewards - can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .
Publisher OA PDF DOI
Transformer-based artificial intelligence on single-cell clinical data for homeostatic mechanism inference and rational biomarker discovery
medRxiv · 2025-03-25 · 1 citations
preprintOpen access
Artificial intelligence (AI) applied to single-cell data has the potential to transform our understanding of biological systems by revealing patterns and mechanisms that simpler traditional methods miss. Here, we develop a general-purpose, interpretable AI pipeline consisting of two deep learning models: the Multi-Input Set Transformer++ (MIST) model for prediction and the single-cell FastShap model for interpretability. We apply this pipeline to a large set of routine clinical data containing single-cell measurements of circulating red blood cells (RBC), white blood cells (WBC), and platelets (PLT) to study population fluxes and homeostatic hematological mechanisms. We find that MIST can use these single-cell measurements to explain 70-82% of the variation in blood cell population sizes among patients (RBC count, PLT count, WBC count), compared to 5-20% explained with current approaches. MIST's accuracy implies that substantial information on cellular production and clearance is present in the single-cell measurements. MIST identified substantial crosstalk among RBC, WBC, and PLT populations, suggesting co-regulatory relationships that we validated and investigated using interpretability maps generated by single-cell FastShap. The maps identify granular single-cell subgroups most important for each population's size, enabling generation of evidence-based hypotheses for co-regulatory mechanisms. The interpretability maps also enable rational discovery of a single-WBC biomarker, "Down Shift", that complements an existing marker of inflammation and strengthens diagnostic associations with diseases including sepsis, heart disease, and diabetes. This study illustrates how single-cell data can be leveraged for mechanistic inference with potential clinical relevance and how this AI pipeline can be applied to power scientific discovery.
Publisher OA PDF DOI
No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
ArXiv.org · 2025-10-22
preprintOpen access
Masked diffusion language models (MDLMs) are trained to in-fill positions in randomly masked sequences, in contrast to next-token prediction models. Discussions around MDLMs focus on two benefits: (1) any-order decoding and 2) multi-token decoding. However, we observe that for math and coding tasks, any-order algorithms often underperform or behave similarly to left-to-right sampling, and standard multi-token decoding significantly degrades performance. At inference time, MDLMs compute the conditional distribution of all masked positions. A natural question is: How can we justify this additional compute when left-to-right one-token-at-a-time decoding is on par with any-order decoding algorithms? First, we propose reasoning-as-infilling. By using MDLMs to infill a reasoning template, we can structure outputs and distinguish between reasoning and answer tokens. In turn, this enables measuring answer uncertainty during reasoning, and early exits when the model converges on an answer. Next, given an answer, reasoning-as-infilling enables sampling from the MDLM posterior over reasoning traces conditioned on the answer, providing a new source of high-quality data for post-training. On GSM8k, we observe that fine-tuning LLaDA-8B Base on its posterior reasoning traces provides a performance boost on par with fine-tuning on human-written reasoning traces. Additionally, given an answer, reasoning-as-infilling provides a method for scoring the correctness of the reasoning process at intermediate steps. Second, we propose multi-token entropy decoding (MED), a simple adaptive sampler that minimizes the error incurred by decoding positions in parallel based on the conditional entropies of those positions. MED preserves performance across benchmarks and leads to 2.7x fewer steps. Our work demonstrates that the training and compute used by MDLMs unlock many new inference and post-training methods.
Publisher OA PDF DOI
Development, External Validation, and Deployment of RFAN-ML: A Machine Learning Model to Estimate Renal Function After Nephrectomy
JCO Clinical Cancer Informatics · 2025-11-01
article
PURPOSE: Partial nephrectomy has been advocated as the preferred surgical approach for small kidney tumors over total nephrectomy. However, partial nephrectomy is associated with increased perioperative risk. Estimating renal function after nephrectomy can facilitate personalized patient counseling, guide surgical approach, and identify patients who could benefit from perioperative interventions. Existing prediction models have several limitations including the lack of external validation or a user-friendly tool or application, and most have used traditional statistical methods. METHODS: We used data from two academic medical institutions and machine learning (ML) methods to develop and externally validate renal function after nephrectomy-machine learning (RFAN-ML), a model to estimate long-term renal function after partial or total nephrectomy. Boruta feature selection was used to select four routinely available clinical features, specifically age, BMI, preoperative renal function, and nephrectomy type. In the training set of 1,932 patients, we compared six ML regression models representing a set of both ensemble and nonensemble ML algorithms and optimized for root mean squared error (RMSE). This model was evaluated in a test set of 1,995 patients, and the best performing model was selected as RFAN-ML. RESULTS: , and mean absolute error. CONCLUSION: We developed and externally validated RFAN-ML, a ML model to predict renal function after nephrectomy, and have deployed our model online. RFAN-ML has the potential to improve the care and outcomes in patients with kidney tumors by informing personalized patient counseling and guiding surgical planning.
Publisher DOI

Recent grants

Deep probabilistic predictive models for stroke and coronary heart disease
NIH · $3.3M · 2019–2026
Career: Building Models that Avoid Spurious Correlations through Interpretability and Representation Learning
NSF · $547k · 2022–2027

Frequent coauthors

Aahlad Puli
New York University
60 shared
David M. Blei
59 shared
Mukund Sudarshan
Courant Institute of Mathematical Sciences
40 shared
Marzyeh Ghassemi
30 shared
Neil Jethani
30 shared
Dustin Tran
22 shared
Wouter A. C. van Amsterdam
Heidelberg University
19 shared
Yindalon Aphinyanaphongs
New York University
18 shared

Labs

CILVR at NYUPI

Education

B.S.
Stanford University
Ph.D.
Princeton University

Awards & honors

Best paper award: Best application paper at ICML 2009

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Rajesh Ranganath

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you