Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Rahul Mazumder

Rahul Mazumder

· Nanyang Technological University Associate Professor of Operations Research and Statistics

Massachusetts Institute of Technology · Operations Research and Statistics

Active 1992–2026

h-index20
Citations3.8k
Papers16792 last 5y
Funding$318k
See your match with Rahul Mazumder — sign in to PhdFit.Sign in

About

Rahul Mazumder is the Nanyang Technological University Associate Professor of Operations Research and Statistics and an Associate Professor at the MIT Sloan School of Management. His research interests include data science, statistical machine learning, large scale optimization, mathematical programming, and their interplay. He is particularly interested in 'big data' applications in environmental and climate studies, social science, and recommender systems. Mazumder has published in various prestigious journals such as the Journal of Machine Learning Research, Annals of Statistics, Journal of the American Statistical Association, and Annals of Applied Statistics. He completed his BS and MS in statistics from the Indian Statistical Institute, Kolkata in 2007, and earned his PhD in statistics from Stanford University in 2012.

Research topics

  • Artificial Intelligence
  • Machine Learning
  • Computer Science
  • Mathematics
  • Data Mining
  • Engineering
  • Algorithm
  • Mathematical optimization
  • Geometry

Selected publications

  • MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

    arXiv (Cornell University) · 2026-04-14

    articleOpen accessSenior author

    Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity.

  • MOONSHOT : A Framework for Multi-Objective Pruning of Vision and Large Language Models

    arXiv (Cornell University) · 2026-04-14

    preprintOpen accessSenior author

    Weight pruning is a common technique for compressing large neural networks. We focus on the challenging post-training one-shot setting, where a pre-trained model is compressed without any retraining. Existing one-shot pruning methods typically optimize a single objective, such as a layer-wise reconstruction loss or a second-order Taylor approximation of the training loss. We highlight that neither objective alone is consistently the most effective across architectures and sparsity levels. Motivated by this insight, we propose MOONSHOT, a general and flexible framework that extends any single-objective pruning method into a multi-objective formulation by jointly optimizing both the layer-wise reconstruction error and second-order Taylor approximation of the training loss. MOONSHOT acts as a wrapper around existing pruning algorithms. To enable this integration while maintaining scalability to billion-parameter models, we propose modeling decisions and introduce an efficient procedure for computing the inverse Hessian, preserving the efficiency of state-of-the-art one-shot pruners. When combined with state-of-the-art pruning methods on Llama-3.2 and Llama-2 models, MOONSHOT reduces C4 perplexity by up to 32.6% at 2:4 sparsity and improves zero-shot mean accuracy across seven classification benchmarks by up to 4.9 points. On Vision Transformers, it improves accuracy on ImageNet-1k by over 5 points at 70% sparsity, and on ResNet-50, it yields a 4-point gain at 90% sparsity.

  • Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

    Journal of the American Statistical Association · 2026-04-28

    articleOpen access

    Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

  • Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

    Operations Research · 2026-04-09

    preprintOpen accessSenior author

    GraphL0: Sparse Gaussian Graphical Models with Discrete Optimization Recovering sparse dependency graphs in undirected Gaussian graphical models is a well-known problem in statistical machine learning. Given samples from a [Formula: see text]-dimensional Gaussian distribution, the task amounts to estimating the [Formula: see text] precision (inverse covariance) matrix under the assumption that only a small fraction of its entries are nonzero. In “Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives,” the authors introduce GraphL0. GraphL0 is an estimator based on an [Formula: see text]-penalized pseudo-likelihood, departing from the more common [Formula: see text] relaxation. The resulting formulation is a convex mixed-integer program, which becomes challenging for standard commercial solvers at moderate-to-large [Formula: see text]. To make the approach practical, the authors develop a custom nonlinear branch-and-bound algorithm, alongside scalable approximate solvers. The paper also provides new statistical guarantees for estimation accuracy and support recovery, and experiments on synthetic and real data sets show substantial computational gains over off-the-shelf solvers and competitive runtime and accuracy versus leading alternatives.

  • Modelling with categorical features via exact fusion and sparsity regularization

    Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2026-03-28

    articleSenior author

    Abstract We study the high-dimensional linear regression problem with categorical predictors that have many levels. We propose a new estimation approach, which performs model compression via two mechanisms by simultaneously encouraging (a) clustering of the regression coefficients to collapse some of the categorical levels together; and (b) sparsity of the regression coefficients. We present novel mixed integer programming formulations for our estimator, and develop a custom row generation procedure to speed up the exact off-the-shelf solvers. We also propose a fast approximate algorithm for our method that obtains high-quality feasible solutions via block coordinate descent. As the main building block of our algorithm, we develop an exact algorithm for the univariate case based on dynamic programming, which can be of independent interest. We establish new theoretical guarantees for both the prediction and the cluster recovery performance of our estimator. Our numerical experiments on synthetic and real datasets demonstrate that our proposed estimator tends to outperform the state-of-the-art.

  • SPARTA: An Optimization Framework for Differentially Private Sparse Fine-Tuning

    2025-08-03

    articleOpen access

    KDD ’25, Toronto, ON, Canada

  • TSENOR: Highly-Efficient Algorithm for Finding Transposable N:M Sparse Masks

    ArXiv.org · 2025-05-29

    preprintOpen accessSenior author

    Network pruning reduces the computational requirements of large neural networks, with N:M sparsity -- retaining only N out of every M consecutive weights -- offering a compelling balance between compressed model quality and hardware acceleration. However, N:M sparsity only accelerates forward-pass computations, as N:M patterns are not preserved during matrix transposition, limiting efficiency during training where both passes are computationally intensive. While transposable N:M sparsity has been proposed to address this limitation, existing methods for finding transposable N:M sparse masks either fail to scale to large models or are restricted to M=4 which results in suboptimal compression-accuracy trade-off. We introduce an efficient solver for transposable N:M masks that scales to billion-parameter models. We formulate mask generation as optimal transport problems and solve through entropy regularization and Dykstra's algorithm, followed by a rounding procedure. Our tensor-based implementation exploits GPU parallelism, achieving up to 100x speedup with only 1-10% error compared to existing methods. Our approach can be integrated with layer-wise N:M pruning frameworks including Wanda, SparseGPT and ALPS to produce transposable N:M sparse models with arbitrary N:M values. Experiments show that LLaMA3.2-8B with transposable 16:32 sparsity maintains performance close to its standard N:M counterpart and outperforms standard 2:4 sparse model, showing the practical value of our approach.

  • Efficient Algorithms for Leveraging LLMs for Generative and Predictive Recommender Systems

    2025-05-08

    article

    Large language models (LLMs) have taken the world by storm, revolutionizing the use of AI in products. While scaling laws demonstrate that larger models yield better results, making them work in production is hard, often due to latency demands on inference. In this proposed tutorial, we will share optimizations - both algorithmic and systems-related - that help leverage LLMs (both small and large) for recommendation and generative AI use cases at planet scale for the world's largest professional network - LinkedIn. In the first part of the tutorial, we will discuss state-of-the-art (SOTA) model quantization and pruning techniques. This will be in conjunction with a discussion on GPU kernel-level optimizations including minimizing memory copying, effectively utilizing shared memory, optimizing thread scheduling, and maximizing parallel efficiency. We will discuss our own experience with these inventing and leveraging such techniques, while also discussing the latest advancements from other enterprises and the open source world. Our discussions will cover models ranging in size from 1 billion to 100 billion+ parameters. In the second part of the tutorial, we will discuss the latest advancements in the world of LLM knowledge distillation which can result in training very powerful and performant small language models (SLMs). We will also discuss effective instruction tuning and preference alignment techniques that help with improving accuracy and quality of results for generative use cases. Finally, we will discuss actual production use cases that benefit from the aforementioned techniques at planet scale for LinkedIn.

  • Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

    ArXiv.org · 2025-09-15

    preprintOpen accessSenior author

    Reasoning language models such as DeepSeek-R1 produce long chain-of-thought traces during inference time which make them costly to deploy at scale. We show that using compression techniques such as neural network pruning produces greater performance loss than in typical language modeling tasks, and in some cases can make the model slower since they cause the model to produce more thinking tokens but with worse performance. We show that this is partly due to the fact that standard LLM pruning methods often focus on input reconstruction, whereas reasoning is a decode-dominated task. We introduce a simple, drop-in fix: during pruning we jointly reconstruct activations from the input and the model's on-policy chain-of-thought traces. This "Reasoning-Aware Compression" (RAC) integrates seamlessly into existing pruning workflows such as SparseGPT, and boosts their performance significantly. Code reproducing the results in the paper can be found at: https://github.com/RyanLucas3/RAC

  • Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems

    ArXiv.org · 2025-02-20 · 1 citations

    preprintOpen accessSenior author

    Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendation systems to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this paper, we present a comprehensive set of insights for training and deploying small language models (SLMs) that deliver high performance for a variety of industry use cases. We focus on two key techniques: (1) knowledge distillation and (2) model compression via structured pruning and quantization. These approaches enable SLMs to retain much of the quality of their larger counterparts while significantly reducing training/serving costs and latency. We detail the impact of these techniques on a variety of use cases in a large professional social network platform and share deployment lessons, including hardware optimization strategies that improve speed and throughput for both predictive and reasoning-based applications in Recommendation Systems.

Recent grants

Frequent coauthors

Labs

  • MIT Sloan School of ManagementPI

Awards & honors

  • 2024 Leo Breiman Junior Award from the Statistical Learning…
  • 2023 International Indian Statistical Association (IISA) Ear…
  • 2021 Donald P. Gaver, Jr. Early Career Award from INFORMS
  • 2018 Young Investigator Program (YIP) Award from the Office…
  • 2020 INFORMS Optimization Society Prize for Young Researcher…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Rahul Mazumder

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup