Sarah Ostadabbas

Verified

Northeastern University · Electrical and Energy Engineering

Active 2007–2026

h-index24

Citations2.0k

Papers17487 last 5y

Funding$1.6M2 active

Faculty page

See your match with Sarah Ostadabbas — sign in to PhdFit.Sign in

About

Sarah Ostadabbas is an associate professor of electrical and computer engineering at Northeastern University College of Engineering. Her research focuses on multi-object tracking, particularly in complex scenarios involving occlusions and partial views, and on developing AI systems capable of estimating hidden objects or persons in various environments. Her work aims to enhance computer vision algorithms to better understand and interpret scenes where objects are not directly visible, contributing to advancements in areas such as autonomous systems and health monitoring. She collaborates extensively with interdisciplinary teams, including her work with Bishoy Galoaa, a student who has contributed to projects involving tracking in three-dimensional and two-dimensional spaces, and the development of non-invasive autism detection systems for infants and toddlers. Her research has led to publications in prominent conferences such as WACV 2025 and the International Conference on Machine Learning 2025, and her lab's efforts include creating practical applications like the toddler tracking app AiWover. Her contributions are driven by a focus on addressing 'forgotten or neglected problems' in AI, with an emphasis on impactful, real-world solutions.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Information Retrieval
Computer vision
Data science
Engineering
Mathematics
Management science

Selected publications

Brain Network Connectivity During Resting-State and a Visuospatial Task as a Biomarker for Spatial Neglect in Stroke Patients
Neurorehabilitation and neural repair · 2026-03-03
articleOpen access
BackgroundSpatial neglect (SN) is a common visual attention deficit affecting stroke patients due to large-scale disruptions within brain networks. Most studies have focused only on resting-state, but effective rehabilitation requires a clearer understanding of how brain networks change during visuospatial tasks.ObjectiveThis study aims to identify network disruptions associated with neglect by comparing resting-state and task-based electroencephalography (EEG) connectivity patterns in stroke patients with and without neglect.MethodsWe recorded EEG data from 28 stroke patients using the augmented reality (AR)-based EEG-guided neglect detection system (AREEN) during resting-state and a visuospatial task. Connectivity was measured using coherence in delta, theta, alpha, and beta bands for both conditions, with gamma-band coherence assessed only during the task. Graph-based metrics were applied to model network-level disruptions. Classification models evaluated the significance of connectivity features to find patterns predictive of neglect.ResultsThe neglect group showed reduced connectivity in frontal and right parieto-occipital (ParOcc) regions, primarily in beta and theta bands, during both conditions, with additional gamma-band connectivity differences in the task condition, compared to the non-neglect group. Conversely, connectivity was greater in central and midline regions, which may indicate a maladaptive shift in network organization. Classification models accurately classified patients into neglect and non-neglect groups (resting-state: 87.0% ± 0.7%; task: 80.9% ± 16.0%). Feature importance analysis identified eigenvector and closeness centrality within frontal, right ParOcc, and central regions as key predictors.ConclusionsNetwork disruptions can effectively identify SN and provide potential targets for connectivity-based rehabilitation. Future studies should investigate whether these interventions improve attention and recovery in stroke patients.This study was registered at ClinicalTrials.gov under ID NCT04187131.
Publisher DOI
PhyGround: Benchmarking Physical Reasoning in Generative World Models
arXiv (Cornell University) · 2026-05-11
preprintOpen access
Generative world models are increasingly used for video generation, where learned simulators are expected to capture the physical rules that govern real-world dynamics. However, evaluating whether generated videos actually follow these rules remains challenging. Existing physics-focused video benchmarks have made important progress, but they still face three key challenges, including the coarse evaluation frameworks that hide law-specific failures, response biases and fatigue that undermine the validity of annotation judgments, and automated evaluators that are insufficiently physics-aware or difficult to audit. To address those challenges, we introduce PhyGround, a criteria-grounded benchmark for evaluating physical reasoning in video generation. The benchmark contains 250 curated prompts, each augmented with an expected physical outcome, and a taxonomy of 13 physical laws across solid-body mechanics, fluid dynamics, and optics. Each law is operationalized through observable sub-questions to enable per-law diagnostics. We evaluate eight modern video generation models through a large-scale, quality-controlled human study, grounded on social science lab experiment design. A total of 459 annotators provided 5,796 complete annotations and over 37.4K fine-grained labels; after quality control, the retained annotations exhibited high split-half model-ranking correlations (Spearman's rho > 0.90). To support reproducible automated evaluation, we release PhyJudge-9B, an open physics-specialized VLM judge. PhyJudge-9B achieves substantially lower aggregate relative bias than Gemini-3.1-Pro (3.3% vs. 16.6%). We release prompts, human annotations, model checkpoints, and evaluation code on the project page https://phyground.github.io/.
Publisher DOI
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
ArXiv.org · 2026-05-14
articleOpen accessSenior author
We present PanoWorld, a panoramic video world model that generates geometry-consistent 360$\degree$ video from a single image and a caption. Existing panoramic video methods optimize primarily for visual realism and do not explicitly constrain the underlying 3D scene state, producing outputs that appear plausible yet exhibit inconsistent depth, broken correspondences, and implausible motion across the spherical surface. We address this gap by framing panoramic video generation as a geometry- and dynamics-consistent latent state modeling problem rather than pure visual synthesis. Building on a pre-trained perspective video world model, we introduce two lightweight regularizers: a depth consistency loss against pseudo ground-truth panoramic depth, and a trajectory consistency loss that supervises the 3D world-frame positions of tracked points across time. We further apply spherical-geometry-aware adaptation to the conditioning and positional encoding. We additionally introduce PanoGeo, a unified geometry-aware panoramic video dataset with consistent depth, trajectory, and prompt annotations across diverse real and synthetic sources, used for both training and stratified evaluation. Experiments show that PanoWorld improves geometric consistency over prior panoramic generation methods while maintaining competitive visual realism, establishing that panoramic video generation must be treated as a geometric modeling problem to support the holistic spatial understanding requirements of embodied AI applications. Code is available at https://github.com/ostadabbas/PanoWorld.
Publisher OA PDF
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
Open MIND · 2026-02-04
preprintSenior author
We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements across all tested architectures and datasets including Trackformer, MOTR, FairMOT, ByteTrack, GTR, and MOTE. Extensive evaluations show up to 53\% reduction in identity switches and 12\% IDF1 improvements across challenging benchmarks, with GTR achieving peak performance gains of 9.7\% MOTA on SportsMOT.
DOI
Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation
2026-03-06
article
The development of contactless respiration monitoring for infants could enable advances in the early detection and treatment of breathing irregularities, which are associated with neurodevelopmental impairments and conditions like sudden infant death syndrome (SIDS). But while respiration estimation for adults is supported by a robust ecosystem of computer vision algorithms and video datasets, only one small public video dataset with annotated respiration data for infant subjects exists, and there are no reproducible algorithms which are effective for infants. We introduce the annotated infant respiration dataset of 400 videos (AIR-400), contributing 275 new, carefully annotated videos from 10 recruited subjects to the public corpus. We develop the first reproducible pipelines for infant respiration estimation, based on infant-specific region-of-interest detection and spatiotemporal neural processing enhanced by optical flow inputs. We establish, through comprehensive experiments, the first reproducible benchmarks for the state-of-the-art in vision-based infant respiration estimation. We make our dataset, code repository, and trained models available for public use.
Publisher DOI
PanoWorld: Geometry-Consistent Panoramic Video World Modeling
arXiv (Cornell University) · 2026-05-14
preprintOpen accessSenior author
We present PanoWorld, a panoramic video world model that generates geometry-consistent 360$\degree$ video from a single image and a caption. Existing panoramic video methods optimize primarily for visual realism and do not explicitly constrain the underlying 3D scene state, producing outputs that appear plausible yet exhibit inconsistent depth, broken correspondences, and implausible motion across the spherical surface. We address this gap by framing panoramic video generation as a geometry- and dynamics-consistent latent state modeling problem rather than pure visual synthesis. Building on a pre-trained perspective video world model, we introduce two lightweight regularizers: a depth consistency loss against pseudo ground-truth panoramic depth, and a trajectory consistency loss that supervises the 3D world-frame positions of tracked points across time. We further apply spherical-geometry-aware adaptation to the conditioning and positional encoding. We additionally introduce PanoGeo, a unified geometry-aware panoramic video dataset with consistent depth, trajectory, and prompt annotations across diverse real and synthetic sources, used for both training and stratified evaluation. Experiments show that PanoWorld improves geometric consistency over prior panoramic generation methods while maintaining competitive visual realism, establishing that panoramic video generation must be treated as a geometric modeling problem to support the holistic spatial understanding requirements of embodied AI applications. Code is available at https://github.com/ostadabbas/PanoWorld.
Publisher DOI
A scalable EEG-based spatial neglect detection system in augmented reality for stroke patients
Journal of Neuroscience Methods · 2026-04-01 · 1 citations
articleOpen access
BACKGROUND: Spatial neglect is a common visuospatial attention disorder following a stroke. To overcome weaknesses associated with classic pen-and-paper tests used in some clinical settings, we developed AREEN: an AR-guided EEG-based Neglect detection system. AREEN previously demonstrated that the EEG activity of patients with neglect was distinguishable from that of patients without neglect. However, to use this system practically, it would need to be able to diagnose neglect in new patients who have not been seen before, meaning the system should be able to generalize neglect detection. NEW METHOD: In this study, we investigate the scalability of AREEN across individuals using multiple classification models. To determine the best classifier, four models (logistic regression, linear discriminant analysis, random forest, boosted tree) were tested and cross-validated with a leave-one-participant-out strategy. RESULTS: The boosted tree model resulted in the highest average within-participant accuracies (proportion of EEG trials correctly classified) for both neglect, with a 76.0% average accuracy, and non-neglect, with a 68.2% average accuracy. It also yielded the highest within-group accuracies (proportion of patients within each group that correctly classified above 50% of EEG trials) for neglect 90.9% were correctly grouped, and for non-neglect 90.0%. CONCLUSION: The application of this model would allow for accurate identification of spatial neglect, which could be crucial for determining stroke rehabilitation therapies. Patients also expressed high satisfaction, comfort, and willingness to continue using the system, based on responses to a questionnaire. Future developments of AREEN will aim to rehabilitate neglect by performing neglect detection in real-time with neurofeedback.
Publisher DOI
UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking
ArXiv.org · 2026-02-04
articleOpen accessSenior author
We present UniTrack, a plug-and-play graph-theoretic loss function designed to significantly enhance multi-object tracking (MOT) performance by directly optimizing tracking-specific objectives through unified differentiable learning. Unlike prior graph-based MOT methods that redesign tracking architectures, UniTrack provides a universal training objective that integrates detection accuracy, identity preservation, and spatiotemporal consistency into a single end-to-end trainable loss function, enabling seamless integration with existing MOT systems without architectural modifications. Through differentiable graph representation learning, UniTrack enables networks to learn holistic representations of motion continuity and identity relationships across frames. We validate UniTrack across diverse tracking models and multiple challenging benchmarks, demonstrating consistent improvements across all tested architectures and datasets including Trackformer, MOTR, FairMOT, ByteTrack, GTR, and MOTE. Extensive evaluations show up to 53\% reduction in identity switches and 12\% IDF1 improvements across challenging benchmarks, with GTR achieving peak performance gains of 9.7\% MOTA on SportsMOT.
Publisher OA PDF
PhyGround: Benchmarking Physical Reasoning in Generative World Models
ArXiv.org · 2026-05-11
articleOpen access
Generative world models are increasingly used for video generation, where learned simulators are expected to capture the physical rules that govern real-world dynamics. However, evaluating whether generated videos actually follow these rules remains challenging. Existing physics-focused video benchmarks have made important progress, but they still face three key challenges, including the coarse evaluation frameworks that hide law-specific failures, response biases and fatigue that undermine the validity of annotation judgments, and automated evaluators that are insufficiently physics-aware or difficult to audit. To address those challenges, we introduce PhyGround, a criteria-grounded benchmark for evaluating physical reasoning in video generation. The benchmark contains 250 curated prompts, each augmented with an expected physical outcome, and a taxonomy of 13 physical laws across solid-body mechanics, fluid dynamics, and optics. Each law is operationalized through observable sub-questions to enable per-law diagnostics. We evaluate eight modern video generation models through a large-scale, quality-controlled human study, grounded on social science lab experiment design. A total of 459 annotators provided 5,796 complete annotations and over 37.4K fine-grained labels; after quality control, the retained annotations exhibited high split-half model-ranking correlations (Spearman's rho > 0.90). To support reproducible automated evaluation, we release PhyJudge-9B, an open physics-specialized VLM judge. PhyJudge-9B achieves substantially lower aggregate relative bias than Gemini-3.1-Pro (3.3% vs. 16.6%). We release prompts, human annotations, model checkpoints, and evaluation code on the project page https://phyground.github.io/.
Publisher OA PDF
Dual-Conditioned Temporal Diffusion Modeling for Driving Scene Generation
2025-05-19
articleSenior author
Diffusion models have proven effective at generating high-quality images from learned distributions, but their application to the temporal domain, especially for driving scenarios, remains underexplored. Our work addresses key challenges in existing simulations, such as limited data quality, diversity, and high costs, by extending diffusion models to generate realistic long driving videos. We introduce the Dualconditioned Temporal Diffusion Model (DcTDM), an opensource method that incorporates dual conditioning to enforce temporal consistency by guiding frame transitions. Alongside DcTDM, we present DriveSceneDDM, a comprehensive driving video dataset featuring textual scene descriptions, dense depth maps, and canny edge data. We evaluate DcTDM using common video quality metrics, demonstrating its superior performance over other video diffusion models by producing long, temporally consistent driving videos up to 40s, achieving over 25% improvement in consistency and frame quality.
Publisher DOI

Recent grants

SCH: INT: Collaborative Research: Detection, Assessment and Rehabilitation of Stroke-Induced Visual Neglect Using Augmented Reality (AR) and Electroencephalography (EEG)
NSF · $394k · 2019–2024
CHS: Small: Collaborative Research: A Graph-Based Data Fusion Framework Towards Guiding A Hybrid Brain-Computer Interface
NSF · $190k · 2020–2024
Collaborative Research: Development of a precision closed loop BCI for socially fearful teens with depression and anxiety
NSF · $150k · 2023–2026
CAREER: Learning Visual Representations of Motor Function in Infants as Prodromal Signs for Autism
NSF · $600k · 2022–2027
NRI: EAGER: Teaching Aerial Robots to Perch Like a Bat via AI-Guided Design and Control
NSF · $102k · 2019–2021

Frequent coauthors

Xiaofei Huang
Northeastern University
31 shared
Shuangjun Liu
Northeastern University
28 shared
Behnaz Rezaei
Qualcomm (United Kingdom)
24 shared
Mehrdad Nourani
The University of Texas at Dallas
23 shared
Amirreza Farnoosh
Northeastern University
20 shared
Shaotong Zhu
16 shared
Murat Akçakaya
University of Pittsburgh
13 shared
M. Pompeo
Texas Health Dallas
11 shared

Awards & honors

NSF CAREER Award (2022)
Sony Faculty Innovation Award (2023)
Cade Prize for Inventivity in the Technology category (2024)
Runner-up for the Oracle Excellence Award (2023)
One of the 120+ Women Spearheading Advances in Visual Tech a…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Sarah Ostadabbas

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you