
Rama Chellappa
· Bloomberg Distinguished ProfessorVerifiedJohns Hopkins University · Radiology and Radiological Science
Active 1980–2026
About
Professor Rama Chellappa is a Bloomberg Distinguished Professor with joint appointments in the Departments of Electrical and Computer Engineering and the Biomedical Engineering (School of Medicine) at Johns Hopkins University (JHU). He joined JHU in August 2020 after serving as a Distinguished University Professor and Minta Martin Professor of Engineering at the University of Maryland (UMD), College Park. Professor Chellappa received his B.E. (Hons.) degree in Electronics and Communication Engineering from the University of Madras, India in 1975, his M.E. (with Distinction) from the Indian Institute of Science, Bangalore, India in 1977, and his M.S.E.E. and Ph.D. degrees in Electrical Engineering from Purdue University in 1978 and 1981, respectively. From 1981 to 1991, he was a faculty member in the Department of Electrical Engineering-Systems at the University of Southern California (USC). Between 1991 and 2020, he was a Professor of Electrical and Computer Engineering and an affiliate Professor of Computer Science at UMD, where he was also affiliated with the Center for Automation Research, the Institute for Advanced Computer Studies, and the Applied Mathematics and Scientific Computing Program. Professor Chellappa's current research interests encompass computer vision, machine learning, and artificial intelligence with applications in face recognition, 3D modeling from video, image and video-based recognition of objects, events and activities, medical imaging, domain adaptation, and generalization. His extensive academic and research career reflects a deep commitment to advancing the fields of electrical engineering and biomedical engineering through innovative research and interdisciplinary collaboration.
Research topics
- Artificial Intelligence
- Computer Science
- Statistics
- Mathematics
Selected publications
DiffRegCD: Integrated Registration and Change Detection with Diffusion Features
2026-03-06
articleOpen accessChange detection (CD) is critical in computer vision and remote sensing, with applications in monitoring, disaster response, and urban analysis. Most CD models assume co-registered inputs, but real imagery often suffers from parallax, viewpoint shifts, or long temporal gaps, leading to severe misalignment. Conventional register-then-detect pipelines and recent joint frameworks (e.g., BiFA, ChangeRD) remain limited: they rely on regression-only flow, global homographies, or synthetic perturbations that fail under large displacements. We propose DiffRegCD, an integrated framework that couples dense registration and change detection. DiffRegCD reformulates correspondence as a Gaussian-smoothed classification task, delivering sub-pixel accuracy and stable training. It builds on frozen multi-scale features from a pretrained denoising diffusion model, which provide invariance to viewpoint and illumination variation. Supervision is enabled by controlled affine perturbations applied to standard CD datasets, yielding paired ground truth for both flow and change detection without pseudo-labels. Experiments on aerial (LEVIR-CD, DSIFN-CD, WHU-CD, SYSU-CD) and ground-level (VL-CMU-CD) datasets show that DiffRegCD outperforms recent baselines and remains robust under wide temporal and viewpoint variation, establishing diffusion features and classification-based correspondence as a strong foundation for integrated CD. The code is available at GitHub.
arXiv (Cornell University) · 2026-04-12
articleOpen accessImaging Photoplethysmography (iPPG), an optical procedure which recovers a human's blood volume pulse (BVP) waveform using pixel readout from a camera, is an exciting research field with many researchers performing clinical studies of iPPG algorithms. While current algorithms to solve the iPPG task have shown outstanding performance on benchmark datasets, no state-of-the art algorithms, to the best of our knowledge, performs test-time sampling of solution space, precluding an uncertainty analysis that is critical for clinical applications. We address this deficiency though a new paradigm named Regularized Interpolants with Stochasticity for iPPG (RIS-iPPG). Modeling iPPG recovery as an inverse problem, we build probability paths that evolve the camera pixel distribution to the ground-truth signal distribution by predicting the instantaneous flow and score vectors of a time-dependent stochastic process; and at test-time, we sample the posterior distribution of the correct BVP waveform given the camera pixel intensity measurements by solving a stochastic differential equation. Given that physiological changes are slowly varying, we show that iPPG recovery can be improved through regularization that maximizes the correlation between the residual flow vector predictions of two adjacent time windows. Experimental results on three datasets show that RIS-iPPG provides superior reconstruction quality and uncertainty estimates of the reconstruction, a critical tool for the widespread adoption of iPPG algorithms in clinical and consumer settings.
arXiv (Cornell University) · 2026-04-12
preprintOpen accessImaging Photoplethysmography (iPPG), an optical procedure which recovers a human's blood volume pulse (BVP) waveform using pixel readout from a camera, is an exciting research field with many researchers performing clinical studies of iPPG algorithms. While current algorithms to solve the iPPG task have shown outstanding performance on benchmark datasets, no state-of-the art algorithms, to the best of our knowledge, performs test-time sampling of solution space, precluding an uncertainty analysis that is critical for clinical applications. We address this deficiency though a new paradigm named Regularized Interpolants with Stochasticity for iPPG (RIS-iPPG). Modeling iPPG recovery as an inverse problem, we build probability paths that evolve the camera pixel distribution to the ground-truth signal distribution by predicting the instantaneous flow and score vectors of a time-dependent stochastic process; and at test-time, we sample the posterior distribution of the correct BVP waveform given the camera pixel intensity measurements by solving a stochastic differential equation. Given that physiological changes are slowly varying, we show that iPPG recovery can be improved through regularization that maximizes the correlation between the residual flow vector predictions of two adjacent time windows. Experimental results on three datasets show that RIS-iPPG provides superior reconstruction quality and uncertainty estimates of the reconstruction, a critical tool for the widespread adoption of iPPG algorithms in clinical and consumer settings.
Biomedicines · 2025-09-02 · 1 citations
articleOpen accessBackground. Severe left ventricular hypertrophy (SLVH) is a high-risk structural cardiac abnormality associated with increased risk of heart failure. It is typically assessed using echocardiography or cardiac magnetic resonance imaging, but these modalities are limited by cost, accessibility, and workflow burden. We introduce a deep learning framework that classifies SLVH directly from chest radiographs, without intermediate anatomical estimation models or demographic inputs. A key contribution of this work lies in interpretability. We quantify how clinically relevant attributes are encoded within internal representations, enabling transparent model evaluation and integration into AI-assisted workflows. Methods. We construct class-balanced subsets from the CheXchoNet dataset with equal numbers of SLVH-positive and negative cases while preserving the original train, validation, and test proportions. ResNet-18 is fine-tuned from ImageNet weights, and a Vision Transformer (ViT) encoder is pretrained via masked autoencoding with a trainable classification head. No anatomical or demographic inputs are used during training. We apply Mutual Information Neural Estimation (MINE) to quantify dependence between learned features and five attributes: age, sex, interventricular septal diameter (IVSDd), posterior wall diameter (LVPWDd), and internal diameter (LVIDd). Results. ViT achieves an AUROC of 0.82 [95% CI: 0.78–0.85] and an AUPRC of 0.80 [95% CI: 0.76–0.85], indicating strong performance in SLVH detection from chest radiographs. MINE reveals clinically coherent attribute encoding in learned features: age > sex > IVSDd > LVPWDd > LVIDd. Conclusions. This study shows that SLVH can be accurately classified from chest radiographs alone. The framework combines diagnostic performance with quantitative interpretability, supporting reliable deployment in triage and decision support.
DiffProtect: Generative adversarial examples using diffusion models for facial privacy protection
Pattern Recognition · 2025-11-24 · 1 citations
articleOpen accessSenior author• Diffusion model-based adversarial attacks for facial privacy protection with high visual quality. • Face semantics regularization module preserves visual identity during facial privacy protection. • Attack acceleration strategy significantly improves efficiency while maintaining performance. • 24.5 % absolute improvement in attack success rate compared to state-of-the-art methods. • Real-world validation with commercial API and user study shows practical effectiveness. The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. To address this challenge, several adversarial attack methods have been proposed to protect individuals from being identified by unauthorized FR systems with perturbed facial images. However, these approaches suffer from poor visual quality or low attack success rates, which limit their practical utility. Recently, diffusion models have achieved tremendous success in image generation. In this work, we ask: can diffusion models be used to generate adversarial examples against FR systems to improve both visual quality and attack performance? We propose DiffProtect, a novel method leveraging a diffusion autoencoder to generate semantically meaningful perturbations on FR systems. Extensive experiments demonstrate that DiffProtect produces more natural-looking encrypted images than state-of-the-art methods while achieving significantly higher attack success rates, e.g. , 24.5 % and 25.1 % absolute improvements on the CelebA-HQ and FFHQ datasets. We further evaluate the effectiveness of DiffProtect in the real world using a commercial FR API and validate its usefulness in practice through a user study. Our code is available at https://github.com/joellliu/DiffProtect .
2025-08-04 · 1 citations
articleSenior authorMASt3R redefines image matching as a 3D task but suffers from high inference latency (198ms per image pair on an A40 GPU). We introduce Speedy MASt3R, a post-training optimization framework that achieves a 54% speedup (91ms per pair) without compromising accuracy. Our approach incorporates four key techniques: (1) FlashMatch, which leverages FlashAttention v2 for efficient attention computation; (2) GraphFusion, which optimizes the computation graph using TensorRT; (3) FastNN-Lite, which reduces complexity from quadratic to linear; and (4) HybridCast, which enables mixed-precision inference. Evaluations on five benchmarks (Aachen Day-Night, InLoc, 7-Scenes, ScanNet1500, MegaDepth1500) demonstrate consistent performance, highlighting real-time 3D understanding capabilities.
Distillation-Guided Representation Learning for Unconstrained Video Human Authentication
IEEE Transactions on Biometrics Behavior and Identity Science · 2025-08-04
articleOpen accessHuman authentication is an important and challenging biometric task, particularly from unconstrained videos. While body recognition is a popular approach, gait recognition holds the promise of robustly identifying subjects based on walking patterns instead of appearance information. Previous gait-based approaches have performed well for curated indoor scenes; however, they tend to underperform in unconstrained situations. To address these challenges, we propose a framework, termed Holistic GAit DEtection and Recognition (H-GADER), for human authentication in challenging outdoor scenarios. Specifically, H-GADER leverages a Double Helical Signature to detect segments that contain human movement and builds discriminative features through a novel gait recognition method. To further enhance robustness, H-GADER encodes viewpoint information in its architecture, and distills learned representations from an auxiliary RGB recognition model; this allows H-GADER to learn from maximum amount of data at training time. At test time, H-GADER infers solely from the silhouette modality. Furthermore, we introduce a body recognition model through semantic, large-scale, self-supervised training to complement gait recognition. By conditionally fusing gait and body representations based on the presence/absence of gait information as decided by the gait detection, we demonstrate significant improvements compared to when a single modality or a naive feature ensemble is used. We evaluate our method on multiple existing State-of-The-Arts(SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets, especially on the BRIAR dataset, which features unconstrained, long-distance videos, achieving a 28.9% improvement.
Enrich and Detect: Video Temporal Grounding With Multimodal Llms
2025-10-19 · 1 citations
articleOpen accessWe introduce ED-VTG, a method for fine-grained video temporal grounding utilizing multi-modal large language models. Our approach harnesses the capabilities of multimodal LLMs to jointly process text and video, in order to effectively localize natural language queries in videos through a two-stage process. Rather than being directly grounded, language queries are initially transformed into enriched sentences that incorporate missing details and cues to aid in grounding. In the second stage, these enriched queries are grounded, using a lightweight decoder, which specializes at predicting accurate boundaries conditioned on contextualized representations of the enriched queries. To mitigate noise and reduce the impact of hallucinations, our model is trained with a multiple-instance-learning objective that dynamically selects the optimal version of the query for each training sample. We demonstrate state-of-the-art results across various benchmarks in temporal video grounding and paragraph grounding settings. Experiments reveal that our method significantly outperforms all previously proposed LLM-based temporal grounding approaches and is either superior or comparable to specialized models, while maintaining a clear advantage against them in zero-shot evaluation scenarios.
Innovation in geriatrics: what this series means for care
Innovation in Aging · 2025-11-08
articleOpen access1st authorCorrespondingTOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision
2025-10-19
preprintOpen accessWe address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to jointly generate the answer and the temporal grounding. We operate in a weakly supervised setup where the temporal grounding annotations are not available. We generate pseudo labels for temporal grounding and ensure the validity of these labels by imposing a consistency constraint between the question of a grounding response and the response generated by a question referring to the same temporal segment. We notice that jointly generating the answers with the grounding improves performance on question answering as well as grounding. We evaluate TOGA on grounded QA and open-ended QA tasks. For grounded QA, we consider the NExT-GQA benchmark which is designed to evaluate weakly supervised grounded question answering. For open-ended QA, we consider the MSVD-QA and ActivityNet-QA benchmarks. We achieve state-of-the-art performance for both tasks on these benchmarks.
Recent grants
Clinical Translation and Validation Core
NIH · $42.8M · 2021–2027
ITR: New technology for the Capture, Analysis and Visualization of Human Movement
NSF · $2.6M · 2003–2009
Frequent coauthors
- 1688 shared
Anil K. Jain
- 1682 shared
Joydeep Ghosh
- 1682 shared
Josef Kittler
- 1682 shared
Takeo Kanade
- 1681 shared
Gennady Osipov
- 1681 shared
Witold Russia
Conference Board
- 1681 shared
Madhu Sudan
Harvard University Press
- 1681 shared
Publicity Co-Chairs
Asia University
Labs
AIEM LabPI
Education
- 1981
PhD/1981, Electrical and Computer Engineering
Purdue University
- 1978
MSEE, Electrical and Computer Engineering
Purdue University
- 1977
Master of Engineering (Distinction), Electrical Communication Engineering
Indian Institute of Science
- 1975
Bachelor of Engineering (Hons.), Electronics and Communication Engineering
Anna University Chennai College of Engineering Guindy
Awards & honors
- 2020 Jack S. Kilby Signal Processing Medal
- IEEE Life Fellow
- Society Award from IEEE Signal Processing Society
- IEEE Computer Society Technical Achievement Award
- 2025 Azriel Rosenfeld Lifetime Achievement Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Rama Chellappa
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup