
Rajesh Ranganath
New York University · Computer Science
Active 2008–2026
About
Rajesh Ranganath is an Assistant Professor at the Courant Institute at NYU in Computer Science and at the Center for Data Science. He is also part of the CILVR group. His research interests include causal, statistical, and probabilistic inference, out-of-distribution detection and generalization, deep generative modeling, interpretability, and machine learning for healthcare. Before joining NYU, he earned degrees in computer science, completing his PhD at Princeton University working with Dave Blei, and his undergraduate studies at Stanford University. He has also spent time as a research affiliate at MIT’s Institute for Medical Engineering and Science.
Research topics
- Computer Science
- Data science
- Machine Learning
- Political Science
- Artificial Intelligence
- Medicine
- Public relations
- Knowledge management
- Mathematics
- Pathology
Selected publications
Causal Machine Learning Is Not a Panacea: A Roadmap for Observational Causal Inference in Health
arXiv (Cornell University) · 2026-05-20
preprintOpen accessObjective: The growing availability of large-scale observational clinical datasets and challenges in conducting randomized controlled trials have spurred enthusiasm in using causal machine learning (ML) for causal inference in observational data. We present a roadmap for applying causal ML to observational data. Materials and methods: We outline the importance of assessing validity assumptions within available data and applying causal ML responsibly for clinical experts using causal ML and ML practitioners with limited clinical expertise. Observations: Despite advances in causal ML, its limitations remain largely under-appreciated across disciplines. This gap in shared knowledge may impact the validity of findings. Discussion: Causal assumptions must be satisfied and modeling choices justified. Otherwise, these approaches risk producing biased or misleading results, with consequences for clinical research and patient care. Conclusion: Causal ML can be a powerful tool for generating causal hypotheses. We provide a template to strengthen the rigor and interpretability of causal analyses.
The Journal of Urology · 2026-04-27
articleJournal of Magnetic Resonance Imaging · 2025-11-04 · 3 citations
articleOpen accessBACKGROUND: Access to prostate MRI remains limited due to resource constraints and the need for expert interpretation. PURPOSE: To develop machine learning (ML) models that enable risk-based triage for prostate MRI (ProMT-ML) in the evaluation of prostate cancer. STUDY TYPE: Retrospective and prospective. POPULATION: A total of 11,879 retrospective MRI scans for suspected prostate cancer from a multi-hospital health system, divided into training (N = 9504) and test (N = 2375) sets. A total of 4551 records for prospective validation. FIELD STRENGTH/SEQUENCE: 1.5T and 3T/Turbo-spin echo T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), and dynamic contrast-enhanced (DCE). ASSESSMENT: Prostate Imaging Reporting and Data System (PI-RADS) scores were retrieved from MRI reports. The Boruta algorithm was used to select final input features from candidate features. Two models were developed using supervised ML to estimate the likelihood of an abnormal MRI, defined as PI-RADS ≥ 3: Model A (with prostate volume) and Model B (without prostate volume). Models were compared to PSA. Prostate biopsy pathology was assessed to evaluate potential clinical impact. STATISTICAL TESTS: Area under the receiver operating characteristic curve (AUC) was the primary performance metric. RESULTS: A total of 5580 (46.9%) subjects had a PI-RADS score ≥ 3. After feature selection, Model A included age, PSA, body mass index, and prostate volume, while Model B included age, PSA, body mass index, and systolic blood pressure. Both models A (AUC 0.711) and B (AUC 0.616) significantly outperformed PSA (AUC 0.593). Compared to PSA threshold > 4 ng/mL, Model A demonstrated significantly improved specificity (28.3% vs. 21.9%) and no significant difference in sensitivity (89.0% vs. 86.7%). Among false negatives (Model A: 8.0% (62/776); Model B: 16.8% (130/776)), most (Model A: 87%; Model B: 69%) had benign or clinically insignificant disease on biopsy. On prospective validation, both versions of ProMT-ML significantly outperformed PSA. DATA CONCLUSION: ProMT-ML provides personalized risk estimates of abnormal prostate MRI and can support triage of this test. TECHNICAL EFFICACY: Stage 4.
When accurate prediction models yield harmful self-fulfilling prophecies
Patterns · 2025-04-01 · 11 citations
articleOpen accessPrediction models are popular in medical research and practice. Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions, and they are frequently lauded as instruments for personalized, data-driven healthcare. We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients, but the worse outcome of these patients does not diminish the discrimination of the model. Our main result is a formal characterization of a set of such prediction models. Next, we show that models that are well calibrated before and after deployment are useless for decision-making, as they make no change in the data distribution. These results call for a reconsideration of standard practices for validation and deployment of prediction models that are used in medical decisions.
Time After Time: Deep-Q Effect Estimation for Interventions on When and What to do
ArXiv.org · 2025-03-20
preprintOpen accessSenior authorProblems in fields such as healthcare, robotics, and finance requires reasoning about the value both of what decision or action to take and when to take it. The prevailing hope is that artificial intelligence will support such decisions by estimating the causal effect of policies such as how to treat patients or how to allocate resources over time. However, existing methods for estimating the effect of a policy struggle with \emph{irregular time}. They either discretize time, or disregard the effect of timing policies. We present a new deep-Q algorithm that estimates the effect of both when and what to do called Earliest Disagreement Q-Evaluation (EDQ). EDQ makes use of recursion for the Q-function that is compatible with flexible sequence models, such as transformers. EDQ provides accurate estimates under standard assumptions. We validate the approach through experiments on survival time and tumor growth tasks.
Three Forms of Stochastic Injection for Improved Distribution-to-Distribution Generative Modeling
ArXiv.org · 2025-10-08
preprintOpen accessModeling transformations between arbitrary data distributions is a fundamental scientific challenge, arising in applications like drug discovery and evolutionary simulation. While flow matching offers a natural framework for this task, its use has thus far primarily focused on the noise-to-data setting, while its application in the general distribution-to-distribution setting is underexplored. We find that in the latter case, where the source is also a data distribution to be learned from limited samples, standard flow matching fails due to sparse supervision. To address this, we propose a simple and computationally efficient method that injects stochasticity into the training process by perturbing source samples and flow interpolants. On five diverse imaging tasks spanning biology, radiology, and astronomy, our method significantly improves generation quality, outperforming existing baselines by an average of 9 FID points. Our approach also reduces the transport cost between input and generated samples to better highlight the true effect of the transformation, making flow matching a more practical tool for simulating the diverse distribution transformations that arise in science.
A General Framework for Inference-time Scaling and Steering of Diffusion Models
arXiv (Cornell University) · 2025-01-12 · 2 citations
preprintOpen accessSenior authorDiffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we present Feynman-Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models - even with off-the-shelf rewards - can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .
medRxiv · 2025-03-25 · 1 citations
preprintOpen accessArtificial intelligence (AI) applied to single-cell data has the potential to transform our understanding of biological systems by revealing patterns and mechanisms that simpler traditional methods miss. Here, we develop a general-purpose, interpretable AI pipeline consisting of two deep learning models: the Multi-Input Set Transformer++ (MIST) model for prediction and the single-cell FastShap model for interpretability. We apply this pipeline to a large set of routine clinical data containing single-cell measurements of circulating red blood cells (RBC), white blood cells (WBC), and platelets (PLT) to study population fluxes and homeostatic hematological mechanisms. We find that MIST can use these single-cell measurements to explain 70-82% of the variation in blood cell population sizes among patients (RBC count, PLT count, WBC count), compared to 5-20% explained with current approaches. MIST's accuracy implies that substantial information on cellular production and clearance is present in the single-cell measurements. MIST identified substantial crosstalk among RBC, WBC, and PLT populations, suggesting co-regulatory relationships that we validated and investigated using interpretability maps generated by single-cell FastShap. The maps identify granular single-cell subgroups most important for each population's size, enabling generation of evidence-based hypotheses for co-regulatory mechanisms. The interpretability maps also enable rational discovery of a single-WBC biomarker, "Down Shift", that complements an existing marker of inflammation and strengthens diagnostic associations with diseases including sepsis, heart disease, and diabetes. This study illustrates how single-cell data can be leveraged for mechanistic inference with potential clinical relevance and how this AI pipeline can be applied to power scientific discovery.
No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
ArXiv.org · 2025-10-22
preprintOpen accessMasked diffusion language models (MDLMs) are trained to in-fill positions in randomly masked sequences, in contrast to next-token prediction models. Discussions around MDLMs focus on two benefits: (1) any-order decoding and 2) multi-token decoding. However, we observe that for math and coding tasks, any-order algorithms often underperform or behave similarly to left-to-right sampling, and standard multi-token decoding significantly degrades performance. At inference time, MDLMs compute the conditional distribution of all masked positions. A natural question is: How can we justify this additional compute when left-to-right one-token-at-a-time decoding is on par with any-order decoding algorithms? First, we propose reasoning-as-infilling. By using MDLMs to infill a reasoning template, we can structure outputs and distinguish between reasoning and answer tokens. In turn, this enables measuring answer uncertainty during reasoning, and early exits when the model converges on an answer. Next, given an answer, reasoning-as-infilling enables sampling from the MDLM posterior over reasoning traces conditioned on the answer, providing a new source of high-quality data for post-training. On GSM8k, we observe that fine-tuning LLaDA-8B Base on its posterior reasoning traces provides a performance boost on par with fine-tuning on human-written reasoning traces. Additionally, given an answer, reasoning-as-infilling provides a method for scoring the correctness of the reasoning process at intermediate steps. Second, we propose multi-token entropy decoding (MED), a simple adaptive sampler that minimizes the error incurred by decoding positions in parallel based on the conditional entropies of those positions. MED preserves performance across benchmarks and leads to 2.7x fewer steps. Our work demonstrates that the training and compute used by MDLMs unlock many new inference and post-training methods.
JCO Clinical Cancer Informatics · 2025-11-01
articlePURPOSE: Partial nephrectomy has been advocated as the preferred surgical approach for small kidney tumors over total nephrectomy. However, partial nephrectomy is associated with increased perioperative risk. Estimating renal function after nephrectomy can facilitate personalized patient counseling, guide surgical approach, and identify patients who could benefit from perioperative interventions. Existing prediction models have several limitations including the lack of external validation or a user-friendly tool or application, and most have used traditional statistical methods. METHODS: We used data from two academic medical institutions and machine learning (ML) methods to develop and externally validate renal function after nephrectomy-machine learning (RFAN-ML), a model to estimate long-term renal function after partial or total nephrectomy. Boruta feature selection was used to select four routinely available clinical features, specifically age, BMI, preoperative renal function, and nephrectomy type. In the training set of 1,932 patients, we compared six ML regression models representing a set of both ensemble and nonensemble ML algorithms and optimized for root mean squared error (RMSE). This model was evaluated in a test set of 1,995 patients, and the best performing model was selected as RFAN-ML. RESULTS: , and mean absolute error. CONCLUSION: We developed and externally validated RFAN-ML, a ML model to predict renal function after nephrectomy, and have deployed our model online. RFAN-ML has the potential to improve the care and outcomes in patients with kidney tumors by informing personalized patient counseling and guiding surgical planning.
Recent grants
Deep probabilistic predictive models for stroke and coronary heart disease
NIH · $3.3M · 2019–2026
NSF · $547k · 2022–2027
Frequent coauthors
- 60 shared
Aahlad Puli
New York University
- 59 shared
David M. Blei
- 40 shared
Mukund Sudarshan
Courant Institute of Mathematical Sciences
- 30 shared
Marzyeh Ghassemi
- 30 shared
Neil Jethani
- 22 shared
Dustin Tran
- 19 shared
Wouter A. C. van Amsterdam
Heidelberg University
- 18 shared
Yindalon Aphinyanaphongs
New York University
Labs
Education
B.S.
Stanford University
Ph.D.
Princeton University
Awards & honors
- Best paper award: Best application paper at ICML 2009
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Rajesh Ranganath
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup