
René Vidal
· Rachleff and Penn Integrates Knowledge University Professor, Director of the Center for Innovation in Data Engineering and Science (IDEAS)VerifiedUniversity of Pennsylvania · Statistics and Data Science
Active 1988–2025
About
René Vidal is the Rachleff University Professor at the University of Pennsylvania, with joint appointments in the Department of Radiology in the Perelman School of Medicine and the Department of Electrical and Systems Engineering in the School of Engineering and Applied Science. He is recognized as a global pioneer of data science and has been named a Penn Integrates Knowledge University Professor. Dr. Vidal received his B.S. degree in Electrical Engineering with highest honors from the Pontificia Universidad Catolica de Chile in 1997, and his M.S. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California at Berkeley in 2000 and 2003, respectively. His research areas include computer vision and perception, dynamical systems and control, and machine learning/AI and autonomous systems. He has held research positions at National ICT Australia and has been a faculty member at Johns Hopkins University in the Department of Biomedical Engineering and the Center for Imaging Science.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Data Mining
- Algorithm
- Neuroscience
- Mathematical analysis
- Mathematical optimization
- Applied mathematics
- Mathematics
- Physical medicine and rehabilitation
- Combinatorics
- Medicine
- Psychology
- Human–computer interaction
Selected publications
Concept Lancet: Image Editing with Compositional Representation Transplant
ArXiv.org · 2025-04-03
preprintOpen accessSenior authorDiffusion models are widely used for image editing tasks. Existing editing methods often design a representation manipulation procedure by curating an edit direction in the text embedding or score space. However, such a procedure faces a key challenge: overestimating the edit strength harms visual consistency while underestimating it fails the editing task. Notably, each source image may require a different editing strength, and it is costly to search for an appropriate strength via trial-and-error. To address this challenge, we propose Concept Lancet (CoLan), a zero-shot plug-and-play framework for principled representation manipulation in diffusion-based image editing. At inference time, we decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. This allows us to accurately estimate the presence of concepts in each image, which informs the edit. Based on the editing task (replace/add/remove), we perform a customized concept transplant process to impose the corresponding editing direction. To sufficiently model the concept space, we curate a conceptual representation dataset, CoLan-150K, which contains diverse descriptions and scenarios of visual terms and phrases for the latent dictionary. Experiments on multiple diffusion-based image editing baselines show that methods equipped with CoLan achieve state-of-the-art performance in editing effectiveness and consistency preservation.
A High-Dimensional Statistical Theory for Convex and Nonconvex Matrix Sensing
ArXiv.org · 2025-06-25
preprintOpen accessSenior authorThe problem of matrix sensing, or trace regression, is a problem wherein one wishes to estimate a low-rank matrix from linear measurements perturbed with noise. A number of existing works have studied both convex and nonconvex approaches to this problem, establishing minimax error rates when the number of measurements is sufficiently large relative to the rank and dimension of the low-rank matrix, though a precise comparison of these procedures still remains unexplored. In this work we provide a high-dimensional statistical analysis for symmetric low-rank matrix sensing observed under Gaussian measurements and noise. Our main result describes a novel phenomenon: in this statistical model and in an appropriate asymptotic regime, the behavior of any local minimum of the nonconvex factorized approach (with known rank) is approximately equivalent to that of the matrix hard-thresholding of a corresponding matrix denoising problem, and the behavior of the convex nuclear-norm regularized least squares approach is approximately equivalent to that of matrix soft-thresholding of the same matrix denoising problem. Here "approximately equivalent" is understood in the sense of concentration of Lipchitz functions. As a consequence, the nonconvex procedure uniformly dominates the convex approach in mean squared error. Our arguments are based on a matrix operator generalization of the Convex Gaussian Min-Max Theorem (CGMT) together with studying the interplay between local minima of the convex and nonconvex formulations and their "debiased" counterparts, and several of these results may be of independent interest.
bioRxiv (Cold Spring Harbor Laboratory) · 2025-01-22 · 3 citations
preprintOpen accessAbstract Background Autism spectrum disorder (ASD), a condition defined by deficits in social communication, restricted interests, and repetitive behaviors, is associated with early impairments in motor imitation that persist through childhood and into adulthood. Alterations in the mirror neuron system (MNS), crucial for interpreting and imitating actions, may underlie these ASD-associated differences in motor imitation. High-density diffuse optical tomography (HD-DOT) overcomes logistical challenges of functional magnetic resonance imaging to enable identification of neural substrates of naturalistic motor imitation. Objective We aim to investigate brain function underlying motor observation and imitation in autistic and non-autistic adults. We hypothesize that HD-DOT will reveal greater activation in regions associated with the MNS during motor imitation than motor observation, and that MNS activity will negatively correlate with autistic traits and motor fidelity. Methods We imaged brain function using HD-DOT in N = 100 participants as they engaged in observing or imitating a sequence of arm movements. Additionally, during imitation, participant movements were simultaneously recorded with 3D cameras for computer-vision-based assessment of motor imitation (CAMI). Cortical responses were estimated using general linear models, and multiple regression was used to test for associations with autistic traits, assessed via the Social Responsiveness Scale-2 (SRS), and imitation fidelity, assessed via CAMI. Results Both observing and imitating motor movements elicited significant activations in higher-order visual and MNS regions, including the inferior parietal lobule, superior temporal gyrus, and inferior frontal gyrus. Imitation additionally exhibited greater activation in the superior parietal lobule, primary motor cortex, and supplementary motor area. Notably, the right temporal-parietal junction exhibited activation during observation but not during imitation. Higher autistic traits were associated with increased activation during motor observation in the right superior parietal lobule. No significant correlation between brain activation and CAMI scores was observed. Conclusions Our findings provide robust evidence of shared and task-specific cortical responses underlying motor observation and imitation, emphasizing the differential engagement of MNS regions during motor observation and imitation.
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
arXiv (Cornell University) · 2025-10-05
preprintOpen accessSenior authorLarge Language Models (LLMs) are increasingly deployed in high-risk domains. However, state-of-the-art LLMs often exhibit hallucinations, raising serious concerns about their reliability. Prior work has explored adversarial attacks to elicit hallucinations in LLMs, but these methods often rely on unrealistic prompts, either by inserting nonsensical tokens or by altering the original semantic intent. Consequently, such approaches provide limited insight into how hallucinations arise in real-world settings. In contrast, adversarial attacks in computer vision typically involve realistic modifications to input images. However, the problem of identifying realistic adversarial prompts for eliciting LLM hallucinations remains largely underexplored. To address this gap, we propose Semantically Equivalent and Coherent Attacks (SECA), which elicit hallucinations via realistic modifications to the prompt that preserve its meaning while maintaining semantic coherence. Our contributions are threefold: (i) we formulate finding realistic attacks for hallucination elicitation as a constrained optimization problem over the input prompt space under semantic equivalence and coherence constraints; (ii) we introduce a constraint-preserving zeroth-order method to effectively search for adversarial yet feasible prompts; and (iii) we demonstrate through experiments on open-ended multiple-choice question answering tasks that SECA achieves higher attack success rates while incurring almost no semantic equivalence or semantic coherence errors compared to existing methods. SECA highlights the sensitivity of both open-source and commercial gradient-inaccessible LLMs to realistic and plausible prompt variations. Code is available at https://github.com/Buyun-Liang/SECA.
Tutorial on Recommendation with Generative Models (Gen-RecSys)
2025-02-26 · 15 citations
articleThis intermediate-level tutorial, titled "Gen-RecSys", merges both industrial and academic perspectives on recent advances in Generative AI for recommender systems (beyond LLMs). It aims to highlight the transformative role of generative models in modern recommender systems, which have significantly impacted the AI field-particularly with the rise of large language models (LLMs) like ChatGPT-and have contributed to a rapid convergence of the fields of search, data mining, and recommendation. By providing attendees with a modern perspective on GenAI applications in recommendation, the tutorial will emphasize how generative models can drive recommendation by unlocking and interacting with rich data representations, including behavioral, textual, and multi-modal data-knowledge highly transferable across many applications of interest to the WSDM community. Participants will learn about the categorization of generative models in recommender systems based on underlying data modalities: (i) ID-based collaborative models, (ii) text-driven models such as LLMs, and (iii) multi-modal models. Within each category, various deep generative model paradigms (e.g., AR, GAN, diffusion models) will be introduced, along with insights into their application areas. The tutorial will also cover evaluation aspects, including benchmarks, metrics, and assessments of social and ethical impacts and harms. This tutorial presents a condensed version of the industrial and academic work featured in the forthcoming book at FntIR 2024-25, titled "Recommendation with Generative Models [7]," and a shorter version prepared, and presented by the team, see GenRecSys-Survey [6].
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
ArXiv.org · 2025-02-28
preprintOpen accessSenior authorConvolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have led to significant progress in 2D body pose estimation. However, achieving a good balance between accuracy, efficiency, and robustness remains a challenge. For instance, CNNs are computationally efficient but struggle with long-range dependencies, while ViTs excel in capturing such dependencies but suffer from quadratic computational complexity. This paper proposes two ViT-based models for accurate, efficient, and robust 2D pose estimation. The first one, EViTPose, operates in a computationally efficient manner without sacrificing accuracy by utilizing learnable joint tokens to select and process a subset of the most important body patches, enabling us to control the trade-off between accuracy and efficiency by changing the number of patches to be processed. The second one, UniTransPose, while not allowing for the same level of direct control over the trade-off, efficiently handles multiple scales by combining (1) an efficient multi-scale transformer encoder that uses both local and global attention with (2) an efficient sub-pixel CNN decoder for better speed and accuracy. Moreover, by incorporating all joints from different benchmarks into a unified skeletal representation, we train robust methods that learn from multiple datasets simultaneously and perform well across a range of scenarios -- including pose variations, lighting conditions, and occlusions. Experiments on six benchmarks demonstrate that the proposed methods significantly outperform state-of-the-art methods while improving computational efficiency. EViTPose exhibits a significant decrease in computational complexity (30% to 44% less in GFLOPs) with a minimal drop of accuracy (0% to 3.5% less), and UniTransPose achieves accuracy improvements ranging from 0.9% to 43.8% across these benchmarks.
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
2025-10-19
preprintOpen accessSenior authorImage restoration aims to recover high-quality images from degraded observations. When the degradation process is known, the recovery problem can be formulated as an inverse problem, and in a Bayesian context, the goal is to sample a clean reconstruction given the degraded observation. Recently, modern pretrained diffusion models have been used for image restoration by modifying their sampling procedure to account for the degradation process. However, these methods often rely on certain approximations that can lead to significant errors and compromised sample quality. In this paper, we provide the first rigorous analysis of this approximation error for linear inverse problems under distributional assumptions on the space of natural images, demonstrating cases where previous works can fail dramatically. Motivated by our theoretical insights, we propose a simple modification to existing diffusion-based restoration methods. Our approach introduces a time-varying low-pass filter in the frequency domain of the measurements, progressively incorporating higher frequencies during the restoration process. We develop an adaptive curriculum for this frequency schedule based on the underlying data distribution. Our method significantly improves performance on challenging image restoration tasks including motion deblurring and image dehazing.
Nonconvex Linear System Identification with Minimal State Representation
ArXiv.org · 2025-04-26
preprintOpen accessSenior authorLow-order linear System IDentification (SysID) addresses the challenge of estimating the parameters of a linear dynamical system from finite samples of observations and control inputs with minimal state representation. Traditional approaches often utilize Hankel-rank minimization, which relies on convex relaxations that can require numerous, costly singular value decompositions (SVDs) to optimize. In this work, we propose two nonconvex reformulations to tackle low-order SysID (i) Burer-Monterio (BM) factorization of the Hankel matrix for efficient nuclear norm minimization, and (ii) optimizing directly over system parameters for real, diagonalizable systems with an atomic norm style decomposition. These reformulations circumvent the need for repeated heavy SVD computations, significantly improving computational efficiency. Moreover, we prove that optimizing directly over the system parameters yields lower statistical error rates, and lower sample complexities that do not scale linearly with trajectory length like in Hankel-nuclear norm minimization. Additionally, while our proposed formulations are nonconvex, we provide theoretical guarantees of achieving global optimality in polynomial time. Finally, we demonstrate algorithms that solve these nonconvex programs and validate our theoretical claims on synthetic data.
ArXiv.org · 2025-04-24
preprintOpen accessSenior authorContinual learning is an emerging subject in machine learning that aims to solve multiple tasks presented sequentially to the learner without forgetting previously learned tasks. Recently, many deep learning based approaches have been proposed for continual learning, however the mathematical foundations behind existing continual learning methods remain underdeveloped. On the other hand, adaptive filtering is a classic subject in signal processing with a rich history of mathematically principled methods. However, its role in understanding the foundations of continual learning has been underappreciated. In this tutorial, we review the basic principles behind both continual learning and adaptive filtering, and present a comparative analysis that highlights multiple connections between them. These connections allow us to enhance the mathematical foundations of continual learning based on existing results for adaptive filtering, extend adaptive filtering insights using existing continual learning methods, and discuss a few research directions for continual learning suggested by the historical developments in adaptive filtering.
Imaging Neuroscience · 2025-01-01 · 3 citations
articleOpen accessAutism spectrum disorder (ASD), a condition defined by deficits in social communication, restricted interests, and repetitive behaviors, is associated with early impairments in motor imitation that persist through childhood and into adulthood. Alterations in the mirror neuron system (MNS), crucial for interpreting and imitating actions, may underlie these ASD-associated differences in motor imitation. High-density diffuse optical tomography (HD-DOT) overcomes logistical challenges of functional magnetic resonance imaging to enable identification of neural substrates of naturalistic motor imitation. We aim to investigate brain function underlying motor observation and imitation in autistic and non-autistic adults. We hypothesize that HD-DOT will reveal greater activation in regions associated with the MNS during motor imitation than during motor observation, and that MNS activity will negatively correlate with autistic traits and motor fidelity. We imaged brain function using HD-DOT in N = 100 participants (19 ASD and 81 non-autistic individuals) as they engaged in observing or imitating a sequence of arm movements. Additionally, during imitation, participant movements were simultaneously recorded with 3D cameras for computerized assessment of motor imitation (CAMI). Cortical responses were estimated using general linear models, and multiple regression was used to test for associations with autistic traits, assessed via the Social Responsiveness Scale-2 (SRS-2), and imitation fidelity, assessed via CAMI. Both observing and imitating motor movements elicited significant activations in higher-order visual and MNS regions, including the inferior parietal lobule, superior temporal gyrus, and inferior frontal gyrus. Imitation additionally exhibited greater activation in the superior parietal lobule, primary motor cortex, and supplementary motor area. Notably, the right temporal-parietal junction exhibited activation during observation but not during imitation. Higher presence of autistic traits was associated with increased activation during motor observation in the right superior parietal lobule. No significant associations between brain activation and CAMI scores were observed. Our findings provide robust evidence of shared and task-specific cortical responses underlying motor observation and imitation, emphasizing the differential engagement of MNS regions during motor observation and imitation.
Recent grants
CRS--EHS: Collaborative Research: An Algebraic Geometric Approach to Hybrid Systems Identification
NSF · $200k · 2005–2008
NSF · $391k · 2013–2016
NSF · $493k · 2010–2012
BIGDATA: F: DKA: Learning a Union of Subspaces from Big and Corrupted Data
NSF · $609k · 2014–2018
SCH: A Computer Vision and Lens-Free Imaging System for Automatic Monitoring of Infections
NIH · $1.1M · 2019–2024
Frequent coauthors
- 47 shared
Benjamin D. Haeffele
Johns Hopkins University
- 41 shared
Daniel P. Robinson
Lehigh University
- 37 shared
S. Shankar Sastry
- 30 shared
Manolis C. Tsakiris
University of Chinese Academy of Sciences
- 28 shared
Yi Ma
Shaoyang University
- 26 shared
Chong You
Universiti Tunku Abdul Rahman
- 24 shared
Roberto Tron
- 22 shared
Gregory D. Hager
Johns Hopkins University
Labs
Education
- 1998
Ph.D., Electrical and Computer Engineering
University of California, San Diego
- 1995
M.S., Electrical and Computer Engineering
University of California, San Diego
- 1993
B.S., Electrical and Computer Engineering
University of California, San Diego
Awards & honors
- Penn Integrates Knowledge University Professor
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with René Vidal
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup