Rama Chellappa

· Bloomberg Distinguished ProfessorVerified

Johns Hopkins University · Radiology and Radiological Science

Active 1980–2026

h-index124

Citations73.7k

Papers1.4k235 last 5y

Funding$45.4M1 active

Faculty page Lab page

See your match with Rama Chellappa — sign in to PhdFit.Sign in

About

Professor Rama Chellappa is a Bloomberg Distinguished Professor with joint appointments in the Departments of Electrical and Computer Engineering and the Biomedical Engineering (School of Medicine) at Johns Hopkins University (JHU). He joined JHU in August 2020 after serving as a Distinguished University Professor and Minta Martin Professor of Engineering at the University of Maryland (UMD), College Park. Professor Chellappa received his B.E. (Hons.) degree in Electronics and Communication Engineering from the University of Madras, India in 1975, his M.E. (with Distinction) from the Indian Institute of Science, Bangalore, India in 1977, and his M.S.E.E. and Ph.D. degrees in Electrical Engineering from Purdue University in 1978 and 1981, respectively. From 1981 to 1991, he was a faculty member in the Department of Electrical Engineering-Systems at the University of Southern California (USC). Between 1991 and 2020, he was a Professor of Electrical and Computer Engineering and an affiliate Professor of Computer Science at UMD, where he was also affiliated with the Center for Automation Research, the Institute for Advanced Computer Studies, and the Applied Mathematics and Scientific Computing Program. Professor Chellappa's current research interests encompass computer vision, machine learning, and artificial intelligence with applications in face recognition, 3D modeling from video, image and video-based recognition of objects, events and activities, medical imaging, domain adaptation, and generalization. His extensive academic and research career reflects a deep commitment to advancing the fields of electrical engineering and biomedical engineering through innovative research and interdisciplinary collaboration.

Research topics

Artificial Intelligence
Computer Science
Statistics
Mathematics

Selected publications

DiffRegCD: Integrated Registration and Change Detection with Diffusion Features
2026-03-06
articleOpen access
Change detection (CD) is critical in computer vision and remote sensing, with applications in monitoring, disaster response, and urban analysis. Most CD models assume co-registered inputs, but real imagery often suffers from parallax, viewpoint shifts, or long temporal gaps, leading to severe misalignment. Conventional register-then-detect pipelines and recent joint frameworks (e.g., BiFA, ChangeRD) remain limited: they rely on regression-only flow, global homographies, or synthetic perturbations that fail under large displacements. We propose DiffRegCD, an integrated framework that couples dense registration and change detection. DiffRegCD reformulates correspondence as a Gaussian-smoothed classification task, delivering sub-pixel accuracy and stable training. It builds on frozen multi-scale features from a pretrained denoising diffusion model, which provide invariance to viewpoint and illumination variation. Supervision is enabled by controlled affine perturbations applied to standard CD datasets, yielding paired ground truth for both flow and change detection without pseudo-labels. Experiments on aerial (LEVIR-CD, DSIFN-CD, WHU-CD, SYSU-CD) and ground-level (VL-CMU-CD) datasets show that DiffRegCD outperforms recent baselines and remains robust under wide temporal and viewpoint variation, establishing diffusion features and classification-based correspondence as a strong foundation for integrated CD. The code is available at GitHub.
Publisher OA PDF DOI
Uncertainty-quantified Pulse Signal Recovery from Facial Video using Regularized Stochastic Interpolants
arXiv (Cornell University) · 2026-04-12
articleOpen access
Imaging Photoplethysmography (iPPG), an optical procedure which recovers a human's blood volume pulse (BVP) waveform using pixel readout from a camera, is an exciting research field with many researchers performing clinical studies of iPPG algorithms. While current algorithms to solve the iPPG task have shown outstanding performance on benchmark datasets, no state-of-the art algorithms, to the best of our knowledge, performs test-time sampling of solution space, precluding an uncertainty analysis that is critical for clinical applications. We address this deficiency though a new paradigm named Regularized Interpolants with Stochasticity for iPPG (RIS-iPPG). Modeling iPPG recovery as an inverse problem, we build probability paths that evolve the camera pixel distribution to the ground-truth signal distribution by predicting the instantaneous flow and score vectors of a time-dependent stochastic process; and at test-time, we sample the posterior distribution of the correct BVP waveform given the camera pixel intensity measurements by solving a stochastic differential equation. Given that physiological changes are slowly varying, we show that iPPG recovery can be improved through regularization that maximizes the correlation between the residual flow vector predictions of two adjacent time windows. Experimental results on three datasets show that RIS-iPPG provides superior reconstruction quality and uncertainty estimates of the reconstruction, a critical tool for the widespread adoption of iPPG algorithms in clinical and consumer settings.
Publisher OA PDF
Uncertainty-quantified Pulse Signal Recovery from Facial Video using Regularized Stochastic Interpolants
arXiv (Cornell University) · 2026-04-12
preprintOpen access
Imaging Photoplethysmography (iPPG), an optical procedure which recovers a human's blood volume pulse (BVP) waveform using pixel readout from a camera, is an exciting research field with many researchers performing clinical studies of iPPG algorithms. While current algorithms to solve the iPPG task have shown outstanding performance on benchmark datasets, no state-of-the art algorithms, to the best of our knowledge, performs test-time sampling of solution space, precluding an uncertainty analysis that is critical for clinical applications. We address this deficiency though a new paradigm named Regularized Interpolants with Stochasticity for iPPG (RIS-iPPG). Modeling iPPG recovery as an inverse problem, we build probability paths that evolve the camera pixel distribution to the ground-truth signal distribution by predicting the instantaneous flow and score vectors of a time-dependent stochastic process; and at test-time, we sample the posterior distribution of the correct BVP waveform given the camera pixel intensity measurements by solving a stochastic differential equation. Given that physiological changes are slowly varying, we show that iPPG recovery can be improved through regularization that maximizes the correlation between the residual flow vector predictions of two adjacent time windows. Experimental results on three datasets show that RIS-iPPG provides superior reconstruction quality and uncertainty estimates of the reconstruction, a critical tool for the widespread adoption of iPPG algorithms in clinical and consumer settings.
Publisher DOI
Encoding of Demographic and Anatomical Information in Chest X-Ray-Based Severe Left Ventricular Hypertrophy Classifiers
Biomedicines · 2025-09-02 · 1 citations
articleOpen access
Background. Severe left ventricular hypertrophy (SLVH) is a high-risk structural cardiac abnormality associated with increased risk of heart failure. It is typically assessed using echocardiography or cardiac magnetic resonance imaging, but these modalities are limited by cost, accessibility, and workflow burden. We introduce a deep learning framework that classifies SLVH directly from chest radiographs, without intermediate anatomical estimation models or demographic inputs. A key contribution of this work lies in interpretability. We quantify how clinically relevant attributes are encoded within internal representations, enabling transparent model evaluation and integration into AI-assisted workflows. Methods. We construct class-balanced subsets from the CheXchoNet dataset with equal numbers of SLVH-positive and negative cases while preserving the original train, validation, and test proportions. ResNet-18 is fine-tuned from ImageNet weights, and a Vision Transformer (ViT) encoder is pretrained via masked autoencoding with a trainable classification head. No anatomical or demographic inputs are used during training. We apply Mutual Information Neural Estimation (MINE) to quantify dependence between learned features and five attributes: age, sex, interventricular septal diameter (IVSDd), posterior wall diameter (LVPWDd), and internal diameter (LVIDd). Results. ViT achieves an AUROC of 0.82 [95% CI: 0.78–0.85] and an AUPRC of 0.80 [95% CI: 0.76–0.85], indicating strong performance in SLVH detection from chest radiographs. MINE reveals clinically coherent attribute encoding in learned features: age > sex > IVSDd > LVPWDd > LVIDd. Conclusions. This study shows that SLVH can be accurately classified from chest radiographs alone. The framework combines diagnostic performance with quantitative interpretability, supporting reliable deployment in triage and decision support.
Publisher OA PDF DOI
DiffProtect: Generative adversarial examples using diffusion models for facial privacy protection
Pattern Recognition · 2025-11-24 · 1 citations
articleOpen accessSenior author
• Diffusion model-based adversarial attacks for facial privacy protection with high visual quality. • Face semantics regularization module preserves visual identity during facial privacy protection. • Attack acceleration strategy significantly improves efficiency while maintaining performance. • 24.5 % absolute improvement in attack success rate compared to state-of-the-art methods. • Real-world validation with commercial API and user study shows practical effectiveness. The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. To address this challenge, several adversarial attack methods have been proposed to protect individuals from being identified by unauthorized FR systems with perturbed facial images. However, these approaches suffer from poor visual quality or low attack success rates, which limit their practical utility. Recently, diffusion models have achieved tremendous success in image generation. In this work, we ask: can diffusion models be used to generate adversarial examples against FR systems to improve both visual quality and attack performance? We propose DiffProtect, a novel method leveraging a diffusion autoencoder to generate semantically meaningful perturbations on FR systems. Extensive experiments demonstrate that DiffProtect produces more natural-looking encrypted images than state-of-the-art methods while achieving significantly higher attack success rates, e.g. , 24.5 % and 25.1 % absolute improvements on the CelebA-HQ and FFHQ datasets. We further evaluate the effectiveness of DiffProtect in the real world using a commercial FR API and validate its usefulness in practice through a user study. Our code is available at https://github.com/joellliu/DiffProtect .
Publisher DOI
Speedy MASt3R
2025-08-04 · 1 citations
articleSenior author
MASt3R redefines image matching as a 3D task but suffers from high inference latency (198ms per image pair on an A40 GPU). We introduce Speedy MASt3R, a post-training optimization framework that achieves a 54% speedup (91ms per pair) without compromising accuracy. Our approach incorporates four key techniques: (1) FlashMatch, which leverages FlashAttention v2 for efficient attention computation; (2) GraphFusion, which optimizes the computation graph using TensorRT; (3) FastNN-Lite, which reduces complexity from quadratic to linear; and (4) HybridCast, which enables mixed-precision inference. Evaluations on five benchmarks (Aachen Day-Night, InLoc, 7-Scenes, ScanNet1500, MegaDepth1500) demonstrate consistent performance, highlighting real-time 3D understanding capabilities.
Publisher DOI
Distillation-Guided Representation Learning for Unconstrained Video Human Authentication
IEEE Transactions on Biometrics Behavior and Identity Science · 2025-08-04
articleOpen access
Human authentication is an important and challenging biometric task, particularly from unconstrained videos. While body recognition is a popular approach, gait recognition holds the promise of robustly identifying subjects based on walking patterns instead of appearance information. Previous gait-based approaches have performed well for curated indoor scenes; however, they tend to underperform in unconstrained situations. To address these challenges, we propose a framework, termed Holistic GAit DEtection and Recognition (H-GADER), for human authentication in challenging outdoor scenarios. Specifically, H-GADER leverages a Double Helical Signature to detect segments that contain human movement and builds discriminative features through a novel gait recognition method. To further enhance robustness, H-GADER encodes viewpoint information in its architecture, and distills learned representations from an auxiliary RGB recognition model; this allows H-GADER to learn from maximum amount of data at training time. At test time, H-GADER infers solely from the silhouette modality. Furthermore, we introduce a body recognition model through semantic, large-scale, self-supervised training to complement gait recognition. By conditionally fusing gait and body representations based on the presence/absence of gait information as decided by the gait detection, we demonstrate significant improvements compared to when a single modality or a naive feature ensemble is used. We evaluate our method on multiple existing State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets, especially on the BRIAR dataset, which features unconstrained, long-distance videos, achieving a 28.9% improvement.
Publisher DOI
Enrich and Detect: Video Temporal Grounding With Multimodal Llms
2025-10-19 · 1 citations
articleOpen access
We introduce ED-VTG, a method for fine-grained video temporal grounding utilizing multi-modal large language models. Our approach harnesses the capabilities of multimodal LLMs to jointly process text and video, in order to effectively localize natural language queries in videos through a two-stage process. Rather than being directly grounded, language queries are initially transformed into enriched sentences that incorporate missing details and cues to aid in grounding. In the second stage, these enriched queries are grounded, using a lightweight decoder, which specializes at predicting accurate boundaries conditioned on contextualized representations of the enriched queries. To mitigate noise and reduce the impact of hallucinations, our model is trained with a multiple-instance-learning objective that dynamically selects the optimal version of the query for each training sample. We demonstrate state-of-the-art results across various benchmarks in temporal video grounding and paragraph grounding settings. Experiments reveal that our method significantly outperforms all previously proposed LLM-based temporal grounding approaches and is either superior or comparable to specialized models, while maintaining a clear advantage against them in zero-shot evaluation scenarios.
Publisher OA PDF DOI
Innovation in geriatrics: what this series means for care
Innovation in Aging · 2025-11-08
articleOpen access1st authorCorresponding
Publisher DOI
TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
2025-10-19
preprintOpen access
We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to jointly generate the answer and the temporal grounding. We operate in a weakly supervised setup where the temporal grounding annotations are not available. We generate pseudo labels for temporal grounding and ensure the validity of these labels by imposing a consistency constraint between the question of a grounding response and the response generated by a question referring to the same temporal segment. We notice that jointly generating the answers with the grounding improves performance on question answering as well as grounding. We evaluate TOGA on grounded QA and open-ended QA tasks. For grounded QA, we consider the NExT-GQA benchmark which is designed to evaluate weakly supervised grounded question answering. For open-ended QA, we consider the MSVD-QA and ActivityNet-QA benchmarks. We achieve state-of-the-art performance for both tasks on these benchmarks.
Publisher OA PDF DOI

Recent grants

Clinical Translation and Validation Core
NIH · $42.8M · 2021–2027
ITR: New technology for the Capture, Analysis and Visualization of Human Movement
NSF · $2.6M · 2003–2009

Frequent coauthors

Anil K. Jain
1688 shared
Joydeep Ghosh
1682 shared
Josef Kittler
1682 shared
Takeo Kanade
1682 shared
Gennady Osipov
1681 shared
Witold Russia
Conference Board
1681 shared
Madhu Sudan
Harvard University Press
1681 shared
Publicity Co-Chairs
Asia University
1681 shared

Labs

AIEM LabPI

Education

PhD/1981, Electrical and Computer Engineering
Purdue University
1981
MSEE, Electrical and Computer Engineering
Purdue University
1978
Master of Engineering (Distinction), Electrical Communication Engineering
Indian Institute of Science
1977
Bachelor of Engineering (Hons.), Electronics and Communication Engineering
Anna University Chennai College of Engineering Guindy
1975

Awards & honors

2020 Jack S. Kilby Signal Processing Medal
IEEE Life Fellow
Society Award from IEEE Signal Processing Society
IEEE Computer Society Technical Achievement Award
2025 Azriel Rosenfeld Lifetime Achievement Award

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Rama Chellappa

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you