Scott T. Acton

· Chair, Charles L. Brown Department of Electrical and Computer Engineering American Telephone and Telegraph Company Professor of Engineering Professor, Biomedical Engineering (By Courtesy)Verified

University of Virginia · Electrical and Computer Engineering

Active 1990–2026

h-index42

Citations9.7k

Papers42354 last 5y

Funding$1.8M1 active

Faculty page Lab page

See your match with Scott T. Acton — sign in to PhdFit.Sign in

About

Professor Scott T. Acton is the Chair of the Charles L. Brown Department of Electrical and Computer Engineering at the University of Virginia and holds the title of American Telephone and Telegraph Company Professor of Engineering. He is also a Professor of Biomedical Engineering by courtesy. Professor Acton leads the Virginia Image and Video Analysis (VIVA) laboratory at UVA, which specializes in biological image analysis problems. The research focus of VIVA includes machine learning techniques for image and video analysis, artificial intelligence applications in education, as well as tracking, segmentation, and enhancement of images and videos. Under his leadership, the lab pursues advancements in these areas to address complex challenges in biological imaging and related fields.

Research topics

Computer Science
Artificial Intelligence
Physics
Biology
Astronomy
Computer vision
Biological system
History
Optics

Selected publications

Deep Temporal Sequence Classification and Mathematical Modeling for Cell Tracking in Dense 3D Microscopy Videos of Bacterial Biofilms
IEEE Transactions on Computational Biology and Bioinformatics · 2026-02-11
articleSenior author
Automatic cell tracking in dense environments is plagued by inaccurate correspondences and misidentification of parent-offspring relationships. In this paper, we introduce a novel cell tracking algorithm named DenseTrack, which integrates deep learning with mathematical model-based strategies to effectively establish correspondences between consecutive frames and detect cell division events in crowded scenarios. We formulate the cell tracking problem as a deep learning-based temporal sequence classification task followed by solving a constrained one-to-one matching optimization problem exploiting the classifier's confidence scores. Additionally, we present an eigendecomposition-based cell division detection strategy that leverages knowledge of cellular geometry. The performance of the proposed approach has been evaluated by tracking densely packed cells in 3D time-lapse image sequences of bacterial biofilm development. The experimental results on simulated as well as experimental fluorescence image sequences suggest that the proposed tracking method achieves superior performance in terms of both qualitative and quantitative evaluation measures compared to recent state-of-the-art cell tracking approaches.
Publisher DOI
DEMIX: Dual-Encoder Latent Masking Framework for Mixed Noise Reduction in Ultrasound Imaging
Open MIND · 2026-02-06
preprintSenior author
Ultrasound imaging is widely used in noninvasive medical diagnostics due to its efficiency, portability, and avoidance of ionizing radiation. However, its utility is limited by the quality of the signal. Signal-dependent speckle noise, signal-independent sensor noise, and non-uniform spatial blurring caused by the transducer and modeled by the point spread function (PSF) degrade the image quality. These degradations challenge conventional image restoration methods, which assume simplified noise models, and highlight the need for specialized algorithms capable of effectively reducing the degradations while preserving fine structural details. We propose DEMIX, a novel dual-encoder denoising framework with a masked gated fusion mechanism, for denoising ultrasound images degraded by mixed noise and further degraded by PSF-induced distortions. DEMIX is inspired by diffusion models and is characterized by a forward process and a deterministic reverse process. DEMIX adaptively assesses the different noise components, disentangles them in the latent space, and suppresses these components while compensating for PSF degradations. Extensive experiments on two ultrasound datasets, along with a downstream segmentation task, demonstrate that DEMIX consistently outperforms state-of-the-art baselines, achieving superior noise suppression and preserving structural details. The code will be made publicly available.
DOI
DEMIX: Dual-Encoder Latent Masking Framework for Mixed Noise Reduction in Ultrasound Imaging
arXiv (Cornell University) · 2026-02-06
articleOpen accessSenior author
Ultrasound imaging is widely used in noninvasive medical diagnostics due to its efficiency, portability, and avoidance of ionizing radiation. However, its utility is limited by the quality of the signal. Signal-dependent speckle noise, signal-independent sensor noise, and non-uniform spatial blurring caused by the transducer and modeled by the point spread function (PSF) degrade the image quality. These degradations challenge conventional image restoration methods, which assume simplified noise models, and highlight the need for specialized algorithms capable of effectively reducing the degradations while preserving fine structural details. We propose DEMIX, a novel dual-encoder denoising framework with a masked gated fusion mechanism, for denoising ultrasound images degraded by mixed noise and further degraded by PSF-induced distortions. DEMIX is inspired by diffusion models and is characterized by a forward process and a deterministic reverse process. DEMIX adaptively assesses the different noise components, disentangles them in the latent space, and suppresses these components while compensating for PSF degradations. Extensive experiments on two ultrasound datasets, along with a downstream segmentation task, demonstrate that DEMIX consistently outperforms state-of-the-art baselines, achieving superior noise suppression and preserving structural details. The code will be made publicly available.
Publisher OA PDF
FABLE: Florence-2–Assisted Behavioral Learning and Embedding for Multilabel Action Recognition
2025-10-26
articleSenior author
Understanding complex activities in a scene requires capturing subtle actor–actor and actor–object interactions, a problem made significantly harder when restricted to a single video frame. Generative Visual Language Models (VLMs) have a salient ability to construct coherent captions, identify object regions, or even identify objects from a given phrase. Florence2–Assisted Behavioral Learning and Embedding (FABLE) for Multilabel Action Recognition combines Florence-2’s regionbased visual grounding with discriminative text-embedding cues to understand the teacher-student interactions and individual activities within an elementary classroom environment. Analogous to CLIP, FABLE employs a cross-similarity mechanism to align visual and textual representations, generating a logit distribution that identifies the most probable actions within each frame. However, FABLE incorporates Florence-2’s image–prompt embeddings with text embeddings derived from label definitions, learning a one-to-one alignment between visual and semantic spaces that yields a coherent multilabel probability distribution. Across 23,000 training frames and 5,000 test frames of labeled elementary classroom data, FABLE was used to identify the Florence-2 pretrained tasks that fine-tune most effectively for classroom action recognition. The model achieved a micro F1 score of 0.74, micro mAP score of 0.74, macro F1 score of 0.77, and macro mAP score of 0.65, and we further report a detailed performance across individual classes.
Publisher DOI
A dynamic predictive transformer with temporal relevance regression for action detection
Pattern Recognition · 2025-04-14 · 4 citations
articleSenior authorCorresponding
Publisher DOI
Why instructional activities within classroom activity structures matter and how teacher dashboards can support advancements in instruction
Edward Elgar Publishing eBooks · 2025-03-14 · 1 citations
book-chapterSenior author
Publisher DOI
Causal State Space Model for Video Understanding
IEEE Signal Processing Letters · 2025-01-01
articleSenior author
We present a causal state space model (CSSM) for video understanding that couples a learned causal DAG with latent state dynamics. Latent factors form DAG nodes, enabling explicit cause–effect modeling over time; the state-space form provides efficient sequence inference, while the graph adds interpretability and robustness to distribution shifts. We learn the latent graph and inject its adjacency into the transition operator. On HMDB-51, UCF-101, and HAR, CSSM improves accuracy over strong baselines and supports counterfactual reasoning about video events.
Publisher DOI
Semanticbox: Bounding Box-Guided Caption Enhanced Action Recognition for Instructional Videos
2025-09-14
article
Multimodal action recognition within complex scenes requires a comprehensive understanding of the entire scene, encompassing both the visual and audio aspects of the video. Contrastive Learning Image Pretraining (CLIP) is a well-known backbone for multi-modal action recognition tasks as seen in ActionCLIP and its variants. However, these models are subject to a major weakness: overemphasis on the background. SemanticBox integrates bounding boxes into the video action recognition CLIP-style paradigm to add visual clues that boost the model’s classification performance. Additionally, a pretrained generative classifier is added to provide rich frame descriptions, enhancing the textual feature semantics and offering an additional performance boost. SemanticBox achieves impressive performance on a complex instructional video dataset characterized by background clutter, achieving comparable Recall@2 to state-of-the-art CLIP-based models and outperforming them in Top-1 and Top-2 accuracy, F1 score, and mean average precision (mAP).
Publisher DOI
PSF-SRDN: Point Spread Function-Aware Speckle Reducing Diffusion Network
2025-08-18 · 1 citations
articleSenior author
Ultrasound images are corrupted by signal-dependent speckle, degrading the image quality and presenting challenges for downstream tasks such as segmentation and classification. The ultrasound transducer, as modeled by the point spread function (PSF), further distorts the speckle and the signal. The PSF has different lateral and axial distortions which should be considered in the design of efficient speckle removal methods. To this end, we propose a novel lateral and axial distortion-aware diffusion network that encodes the spectrum of lateral and axial distortions, thus enabling adaptive denoising of images corrupted with speckle. The distortions have been modeled in the forward and reverse processes of a multiplicative noise-based diffusion model. Extensive experiments on two datasets establish the efficiency of the proposed model over state-of-the-art methods. The code and data are available at https://github.com/soumeeguha/PSF-SRDN.
Publisher DOI
A dynamic fractional generalized deterministic annealing for rapid convergence in deep learning optimization
npj Artificial Intelligence · 2025-10-01
articleOpen accessSenior author
Optimization is central to classical and modern machine learning. This paper introduces Dynamic Fractional Generalized Deterministic Annealing (DF-GDA), a physics-inspired algorithm that boosts stability and speeds convergence across a wide range of models, especially deep networks. Unlike traditional methods such as Stochastic Gradient Descent, which may converge slowly or become trapped in local minima, DF-GDA employs an adaptive, temperature-controlled schedule that balances global exploration with precise refinement. Its dynamic fractional-parameter update selectively optimizes model components, improving computational efficiency. The method excels on high-dimensional tasks, including image classification, and also strengthens simpler classical models by reducing local-minimum risk and increasing robustness to noisy data. Extensive experiments on sixteen large, interdisciplinary datasets, including image classification, natural language processing, healthcare, and biology, show that DF-GDA consistently outperforms both state-of-the-art and traditional optimizers in convergence speed and accuracy, offering a powerful alternative for critical large-scale, complex problems across diverse scientific and industrial settings today.
Publisher OA PDF DOI

Recent grants

NIH Grant R21HL068510
NIH · $141k · 2005
ABI Innovation: Towards the Neurome -- Automated Image Analysis for Neuroinformatics
NSF · $483k · 2011–2016
NIH Grant R33HL068510
NIH · $847k · 2005
EAGER: Spatiotemporal Transformer for Activity Recognition
NSF · $281k · 2023–2026

Frequent coauthors

Nilanjan Ray
46 shared
Tamal Batabyal
Massachusetts Institute of Technology
34 shared
Peter Youngs
31 shared
Klaus Ley
28 shared
Matthew Korban
University of Virginia
26 shared
John A. Hossack
University of Virginia
25 shared
Andrea Vaccari
24 shared
Zongli Lin
University of Virginia
22 shared

Labs

Virginia Image and Video Analysis (VIVA)PI

Awards & honors

IEEE Fellow 2013
Faculty Innovation Award 2017
All-University Teaching Award 2009
Outstanding Young Electrical Engineer 1996
Director’s Award for Superior Accomplishment, National Scien…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Scott T. Acton

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you