
Miguel P. Eckstein
· ProfessorVerifiedUniversity of California, Santa Barbara · Neuroscience
Active 1992–2026
About
Miguel P. Eckstein is a Professor of Psychological & Brain Sciences at UC Santa Barbara, with a background that includes a Bachelor Degree in Physics and Psychology from UC Berkeley and a Doctoral Degree in Cognitive Psychology from UCLA. His professional experience encompasses work at the Department of Medical Physics and Imaging at Cedars Sinai Medical Center and NASA Ames Research Center before his tenure at UC Santa Barbara. Eckstein has received numerous awards, including the Optical Society of America Young Investigator Award, the Society for Optical Engineering (SPIE) Image Perception Cum Laude Award, Cedars Sinai Young Investigator Award, the National Science Foundation CAREER Award, the National Academy of Sciences Troland Award, and a Guggenheim Fellowship. He has served in leadership roles such as chair of the Vision Technical Group of the Optical Society of America, chair of the Human Performance, Image Perception and Technology Assessment conference of the SPIE Medical Imaging Annual Meeting, and has held editorial positions including Vision Editor of the Journal of the Optical Society of America A and member of the board of editors of the Journal of Vision. Additionally, he has participated in NIH study section panels on Mechanisms of Sensory, Perceptual and Cognitive Processes and Biomedical Imaging Technology. Eckstein has published over 170 articles across a wide range of disciplines, focusing on computational human vision, visual attention, search, perceptual learning, and the perception of medical images. His research employs behavioral psychophysics, eye tracking, EEG, fMRI, and computational modeling to understand how the brain performs everyday perceptual tasks such as recognizing faces, objects, and performing visual searches. His work aims to elucidate basic visual perception, eye movements, visual attention, perceptual learning, and decision making, with applications including improving medical image diagnosis, developing bio-inspired computer vision systems, and enhancing human-robot interactions.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Cognitive science
- Natural Language Processing
- Sociology
- Psychology
- Epistemology
- Neuroscience
- Algorithm
- Linguistics
- Cognitive psychology
- Programming language
- Computer vision
- Philosophy
- Medicine
Selected publications
Relating the perceived useful field of view to visual search in 2D Images and 3D volumetric images
Attention Perception & Psychophysics · 2026-04-13
articleSenior authorArXiv.org · 2026-02-18
articleOpen accessSenior authorWe introduce IRIS (Intent Resolution via Inference-time Saccades), a novel training-free approach that uses eye-tracking data in real-time to resolve ambiguity in open-ended VQA. Through a comprehensive user study with 500 unique image-question pairs, we demonstrate that fixations closest to the time participants start verbally asking their questions are the most informative for disambiguation in Large VLMs, more than doubling the accuracy of responses on ambiguous questions (from 35.2% to 77.2%) while maintaining performance on unambiguous queries. We evaluate our approach across state-of-the-art VLMs, showing consistent improvements when gaze data is incorporated in ambiguous image-question pairs, regardless of architectural differences. We release a new benchmark dataset to use eye movement data for disambiguated VQA, a novel real-time interactive protocol, and an evaluation suite.
Open MIND · 2026-02-18
preprintSenior authorWe introduce IRIS (Intent Resolution via Inference-time Saccades), a novel training-free approach that uses eye-tracking data in real-time to resolve ambiguity in open-ended VQA. Through a comprehensive user study with 500 unique image-question pairs, we demonstrate that fixations closest to the time participants start verbally asking their questions are the most informative for disambiguation in Large VLMs, more than doubling the accuracy of responses on ambiguous questions (from 35.2% to 77.2%) while maintaining performance on unambiguous queries. We evaluate our approach across state-of-the-art VLMs, showing consistent improvements when gaze data is incorporated in ambiguous image-question pairs, regardless of architectural differences. We release a new benchmark dataset to use eye movement data for disambiguated VQA, a novel real-time interactive protocol, and an evaluation suite.
A Deep Learning Framework for Predicting Functional Visual Performance in Bionic Eye Users
bioRxiv (Cold Spring Harbor Laboratory) · 2025-06-25
preprintOpen accessEfforts to restore vision via neural implants have outpaced the ability to predict what users will perceive, leaving patients and clinicians without reliable tools for surgical planning or device selection. To bridge this critical gap, we introduce a computational virtual patient (CVP) pipeline that integrates anatomically grounded phosphene simulation with task-optimized deep neural networks (DNNs) to forecast patient perceptual capabilities across diverse prosthetic designs and tasks. We evaluate performance across six visual tasks, six electrode configurations, and two artificial vision models, positioning our CVP approach as a scalable pre-implantation method. Several chosen tasks align with the Functional Low-Vision Observer Rated Assessment (FLORA), revealing correspondence between model-predicted difficulty and real-world patient outcomes. Further, DNNs exhibited strong correspondence with psychophysical data collected from normally sighted subjects viewing phosphene simulations, capturing both overall task difficulty and performance variation across implant configurations. While performance was generally aligned, DNNs sometimes diverged from humans in which specific stimuli were misclassified, reflecting differences in underlying decision strategies between artificial agents and human observers. The findings position CVP as a scientific tool for probing perception under prosthetic vision, an engine to inform device development, and a clinically relevant framework for pre-surgical forecasting.
INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models
ArXiv.org · 2025-11-24
preprintOpen accessWe introduce INTERLACE, a novel framework that prunes redundant layers in VLMs while maintaining performance through sample-efficient finetuning. Existing layer pruning methods lead to significant performance drop when applied to VLMs. Instead, we analyze triplets of consecutive layers to identify local redundancy, removing the most redundant of the first two layers, finetune the remaining layer to compensate for the lost capacity, and freeze the third layer to serve as a stable anchor during finetuning. We found that this interleaved finetune-freeze design enables rapid convergence with minimal data after pruning. By finetuning only a subset of layers on just 1% of the FineVision dataset for one epoch, Interlace achieves 88.9% average performance retention after dropping 25% of the network, achieving SOTA performance. Our code is available at: https://github.com/pmadinei/Interlace.git
Journal of Medical Imaging · 2025-10-16
articleSenior authorPurposeWe aim to assess the perceptual tasks in which convolutional neural networks (CNNs) might be better tools than commonly used linear model observers (LMOs) to evaluate medical image quality.ApproachWe compared the LMOs (channelized Hotelling [CHO] and frequency convolution channels observers [FCO]) and CNN detection accuracies for tasks with a few possible signal locations (location known exactly) and for the search for mass and microcalcification signals embedded in 2D/3D breast tomosynthesis phantoms. We also compared the LMOs and CNN accuracies to those of radiologists in the search tasks. We analyzed radiologists’ eye position to assess whether they fixate longer at locations considered suspicious by the LMOs or those by the CNN.ResultsLMOs resulted in similar detection accuracies [area under the receiver operating characteristic curve (AUC)] to the CNN for tasks with up to 100 signal locations but lower accuracies in the search task for microcalcification and mass 3D images. Radiologists’ AUC was significantly higher (p<1e−4) than that of LMOs for the microcalcification 2D search (CHO, FCO) and 3D mass search (p<0.05, CHO) but was not higher than the CNN’s AUC. For both signal types, radiologists fixated longer on the locations of the highest response scores of the CNN than those of the LMOs but only reached statistical significance for the mass (masses: p=0.009 versus CHO and p=0.004 versus FCO)ConclusionWe show that CNNs are a more suitable model observer for search tasks. Like radiologists but not traditional LMOs, CNNs can discount false positives arising from anatomical backgrounds.
2025-09-17
preprintOpen accessSenior authorAccurate face recognition relies on discerning subtle differences in the shapes (featural information) and relative positions (configural information) of facial features. When recognizing faces, most humans consistently land their first fixation on the face just below the eyes (upper lookers), but some individuals fixate closer to the mouth (lower lookers). Do differences in long-term eye movement strategies influence the ability to process featural and configural facial information? To investigate this, we tested face recognition in upper and lower lookers using specially designed face stimuli that isolated variation in featural or configural information across exemplars, while fixating either at the eye or mouth regions. We found that upper lookers had superior recognition for configural faces than lower lookers, especially when fixating on the eyes. Using single-feature faces, we found that the upper lookers’ advantage arises from efficient extraction of configural face information from the eye region.
The psychophysics of dynamic gaze-following saccades during search
Journal of Vision · 2025-12-17
articleOpen accessSenior authorThe ability to quickly and precisely follow another person's gaze reflects critical evolutionary mechanisms underlying social interactions, such as attention modulation and the prediction of others' future actions. Recent studies show that observers use another person's gaze direction and peripheral scene information to make anticipatory saccades toward the gaze goal. However, it remains unclear how these eye movements are influenced by complex features of natural scenes, such as a foveal gazer, multiple peripheral gaze goals, and the relative distance between gazer and goal. We presented dynamic stimuli (videos) of real-world scenes with or without a gazer shifting their head to gaze at other individuals (gaze goals). Participants were instructed to search for a specific target individual in the videos while their eye movements were recorded. We measured the accuracy of the first saccade in locating the gaze goal. First, we found that the absence of a foveal gazer significantly increased saccade error, but only when the goal was at least approximately 9 degrees of visual angle from the initial fixation. First saccade amplitude and onset latency were higher in the gazer-present condition. Second, when there were multiple potential gaze goals in the periphery, the first saccade was directed to the individual closer to the initial fixation (gazer) location. Finally, the presence of multiple peripheral gaze goals shortened saccade latencies and increased the frequency of anticipatory saccades made before the gazer completed their head movement. These findings extend our understanding of gaze following in complex, naturalistic scenes and inform theories of attention and real-world decision-making.
Semantic Saliency from Multi-Modal Large Language Model Scene Understanding Maps
2025-08-01
preprintOpen accessSenior authorImage-computable low-level image saliency has shaped perceptual psychology and computer science, but is limited by its inability to capture human high-level cognitive factors that influence attention and eye movements. A recent paper has shown that while free-viewing scenes, participants fixate on objects that are critical to the understanding of scenes. Here, we propose a fully automated method (with no fitting parameters or training) to create scene understanding maps (SUMs) that visualize the quantitative contribution of an object to the human comprehension of the scene. The method (AUTO-SUM) uses auto-segmentation and removal of different objects from scenes, Multi-Modal Large Language Models (MLLMs) to describe the scenes, and semantic similarity measures using large language embedding representations of scene descriptions. We show that AUTO-SUMs can approximate H-SUMs estimated using human-operated segmentation and object removal, human scene descriptions, and sentence similarity ratings. We also show that AUTO-SUM can predict the object most fixated by human observers during free viewing and scene description tasks better than a saliency model (GBVS) and comparable to DeepGaze. We contend that AUTO-SUM can be used as a semantic saliency model that complements lower-level saliency models.
2025-07-23
preprintOpen accessSenior authorAccurate face recognition relies on discerning subtle differences in the shapes (featural information) and relative positions (configural information) of facial features. When recognizing faces, most humans consistently land their first fixation on the face just below the eyes (upper lookers), but some individuals fixate closer to the mouth (lower lookers). Do differences in long-term eye movement strategies influence the ability to process featural and configural facial information? To investigate this, we tested face recognition in upper and lower lookers using specially designed face stimuli that isolated variation in featural or configural information across exemplars, while fixating either at the eye or mouth regions. We found that upper lookers had superior recognition for configural faces than lower lookers, especially when fixating on the eyes. Using single-feature faces, we found that the upper lookers’ advantage arises from efficient extraction of configural face information from the eye region.
Recent grants
NIH · $2.3M · 2011
Assessment of medical image quality with foveated search models
NIH · $1.7M · 2015–2020
CAREER: Quantitative Evaluation Of Attention Models Of Visual Search
NSF · $392k · 2002–2008
Perceptual Learning: Human vs. Optimal Bayesian
NIH · $1.8M · 2004–2014
Predictive Cues and Multiple Fixation Search
NSF · $297k · 2008–2012
Frequent coauthors
- 139 shared
Craig K. Abbey
University of California, Santa Barbara
- 41 shared
Barry Giesbrecht
University of California, Santa Barbara
- 40 shared
James S. Whiting
Maine Medical Center
- 40 shared
Steven S. Shimozaki
University of Leicester
- 33 shared
François Bochud
Institute of Radiation Physics
- 26 shared
William Yang Wang
- 25 shared
Brent R. Beutter
Ames Research Center
- 24 shared
Miguel A. Lago
Labs
Vision and Image Understanding LabPI
Not provided
Education
B.A., Physics and Psychology
UC Berkeley
Ph.D., Cognitive Psychology
UCLA
Awards & honors
- Optical Society of America Young Investigator Award
- Society for Optical Engineering (SPIE) Image Perception Cum…
- Cedars Sinai Young Investigator Award
- National Science Foundation CAREER Award
- National Academy of Sciences Troland Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Miguel P. Eckstein
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup