Scott T. Acton
· Chair, Charles L. Brown Department of Electrical and Computer Engineering American Telephone and Telegraph Company Professor of Engineering Professor, Biomedical Engineering (By Courtesy)VerifiedUniversity of Virginia · Electrical and Computer Engineering
Active 1990–2026
About
Professor Scott T. Acton is the Chair of the Charles L. Brown Department of Electrical and Computer Engineering at the University of Virginia and holds the title of American Telephone and Telegraph Company Professor of Engineering. He is also a Professor of Biomedical Engineering by courtesy. Professor Acton leads the Virginia Image and Video Analysis (VIVA) laboratory at UVA, which specializes in biological image analysis problems. The research focus of VIVA includes machine learning techniques for image and video analysis, artificial intelligence applications in education, as well as tracking, segmentation, and enhancement of images and videos. Under his leadership, the lab pursues advancements in these areas to address complex challenges in biological imaging and related fields.
Research topics
- Computer Science
- Artificial Intelligence
- Physics
- Biology
- Astronomy
- Computer vision
- Biological system
- History
- Optics
Selected publications
IEEE Transactions on Computational Biology and Bioinformatics · 2026-02-11
articleSenior authorAutomatic cell tracking in dense environments is plagued by inaccurate correspondences and misidentification of parent-offspring relationships. In this paper, we introduce a novel cell tracking algorithm named DenseTrack, which integrates deep learning with mathematical model-based strategies to effectively establish correspondences between consecutive frames and detect cell division events in crowded scenarios. We formulate the cell tracking problem as a deep learning-based temporal sequence classification task followed by solving a constrained one-to-one matching optimization problem exploiting the classifier's confidence scores. Additionally, we present an eigendecomposition-based cell division detection strategy that leverages knowledge of cellular geometry. The performance of the proposed approach has been evaluated by tracking densely packed cells in 3D time-lapse image sequences of bacterial biofilm development. The experimental results on simulated as well as experimental fluorescence image sequences suggest that the proposed tracking method achieves superior performance in terms of both qualitative and quantitative evaluation measures compared to recent state-of-the-art cell tracking approaches.
DEMIX: Dual-Encoder Latent Masking Framework for Mixed Noise Reduction in Ultrasound Imaging
Open MIND · 2026-02-06
preprintSenior authorUltrasound imaging is widely used in noninvasive medical diagnostics due to its efficiency, portability, and avoidance of ionizing radiation. However, its utility is limited by the quality of the signal. Signal-dependent speckle noise, signal-independent sensor noise, and non-uniform spatial blurring caused by the transducer and modeled by the point spread function (PSF) degrade the image quality. These degradations challenge conventional image restoration methods, which assume simplified noise models, and highlight the need for specialized algorithms capable of effectively reducing the degradations while preserving fine structural details. We propose DEMIX, a novel dual-encoder denoising framework with a masked gated fusion mechanism, for denoising ultrasound images degraded by mixed noise and further degraded by PSF-induced distortions. DEMIX is inspired by diffusion models and is characterized by a forward process and a deterministic reverse process. DEMIX adaptively assesses the different noise components, disentangles them in the latent space, and suppresses these components while compensating for PSF degradations. Extensive experiments on two ultrasound datasets, along with a downstream segmentation task, demonstrate that DEMIX consistently outperforms state-of-the-art baselines, achieving superior noise suppression and preserving structural details. The code will be made publicly available.
DEMIX: Dual-Encoder Latent Masking Framework for Mixed Noise Reduction in Ultrasound Imaging
arXiv (Cornell University) · 2026-02-06
articleOpen accessSenior authorUltrasound imaging is widely used in noninvasive medical diagnostics due to its efficiency, portability, and avoidance of ionizing radiation. However, its utility is limited by the quality of the signal. Signal-dependent speckle noise, signal-independent sensor noise, and non-uniform spatial blurring caused by the transducer and modeled by the point spread function (PSF) degrade the image quality. These degradations challenge conventional image restoration methods, which assume simplified noise models, and highlight the need for specialized algorithms capable of effectively reducing the degradations while preserving fine structural details. We propose DEMIX, a novel dual-encoder denoising framework with a masked gated fusion mechanism, for denoising ultrasound images degraded by mixed noise and further degraded by PSF-induced distortions. DEMIX is inspired by diffusion models and is characterized by a forward process and a deterministic reverse process. DEMIX adaptively assesses the different noise components, disentangles them in the latent space, and suppresses these components while compensating for PSF degradations. Extensive experiments on two ultrasound datasets, along with a downstream segmentation task, demonstrate that DEMIX consistently outperforms state-of-the-art baselines, achieving superior noise suppression and preserving structural details. The code will be made publicly available.
FABLE: Florence-2–Assisted Behavioral Learning and Embedding for Multilabel Action Recognition
2025-10-26
articleSenior authorUnderstanding complex activities in a scene requires capturing subtle actor–actor and actor–object interactions, a problem made significantly harder when restricted to a single video frame. Generative Visual Language Models (VLMs) have a salient ability to construct coherent captions, identify object regions, or even identify objects from a given phrase. Florence2–Assisted Behavioral Learning and Embedding (FABLE) for Multilabel Action Recognition combines Florence-2’s regionbased visual grounding with discriminative text-embedding cues to understand the teacher-student interactions and individual activities within an elementary classroom environment. Analogous to CLIP, FABLE employs a cross-similarity mechanism to align visual and textual representations, generating a logit distribution that identifies the most probable actions within each frame. However, FABLE incorporates Florence-2’s image–prompt embeddings with text embeddings derived from label definitions, learning a one-to-one alignment between visual and semantic spaces that yields a coherent multilabel probability distribution. Across 23,000 training frames and 5,000 test frames of labeled elementary classroom data, FABLE was used to identify the Florence-2 pretrained tasks that fine-tune most effectively for classroom action recognition. The model achieved a micro F1 score of 0.74, micro mAP score of 0.74, macro F1 score of 0.77, and macro mAP score of 0.65, and we further report a detailed performance across individual classes.
A dynamic predictive transformer with temporal relevance regression for action detection
Pattern Recognition · 2025-04-14 · 4 citations
articleSenior authorCorrespondingEdward Elgar Publishing eBooks · 2025-03-14 · 1 citations
book-chapterSenior authorCausal State Space Model for Video Understanding
IEEE Signal Processing Letters · 2025-01-01
articleSenior authorWe present a causal state space model (CSSM) for video understanding that couples a learned causal DAG with latent state dynamics. Latent factors form DAG nodes, enabling explicit cause–effect modeling over time; the state-space form provides efficient sequence inference, while the graph adds interpretability and robustness to distribution shifts. We learn the latent graph and inject its adjacency into the transition operator. On HMDB-51, UCF-101, and HAR, CSSM improves accuracy over strong baselines and supports counterfactual reasoning about video events.
Semanticbox: Bounding Box-Guided Caption Enhanced Action Recognition for Instructional Videos
2025-09-14
articleMultimodal action recognition within complex scenes requires a comprehensive understanding of the entire scene, encompassing both the visual and audio aspects of the video. Contrastive Learning Image Pretraining (CLIP) is a well-known backbone for multi-modal action recognition tasks as seen in ActionCLIP and its variants. However, these models are subject to a major weakness: overemphasis on the background. SemanticBox integrates bounding boxes into the video action recognition CLIP-style paradigm to add visual clues that boost the model’s classification performance. Additionally, a pretrained generative classifier is added to provide rich frame descriptions, enhancing the textual feature semantics and offering an additional performance boost. SemanticBox achieves impressive performance on a complex instructional video dataset characterized by background clutter, achieving comparable Recall@2 to state-of-the-art CLIP-based models and outperforming them in Top-1 and Top-2 accuracy, F1 score, and mean average precision (mAP).
PSF-SRDN: Point Spread Function-Aware Speckle Reducing Diffusion Network
2025-08-18 · 1 citations
articleSenior authorUltrasound images are corrupted by signal-dependent speckle, degrading the image quality and presenting challenges for downstream tasks such as segmentation and classification. The ultrasound transducer, as modeled by the point spread function (PSF), further distorts the speckle and the signal. The PSF has different lateral and axial distortions which should be considered in the design of efficient speckle removal methods. To this end, we propose a novel lateral and axial distortion-aware diffusion network that encodes the spectrum of lateral and axial distortions, thus enabling adaptive denoising of images corrupted with speckle. The distortions have been modeled in the forward and reverse processes of a multiplicative noise-based diffusion model. Extensive experiments on two datasets establish the efficiency of the proposed model over state-of-the-art methods. The code and data are available at https://github.com/soumeeguha/PSF-SRDN.
npj Artificial Intelligence · 2025-10-01
articleOpen accessSenior authorOptimization is central to classical and modern machine learning. This paper introduces Dynamic Fractional Generalized Deterministic Annealing (DF-GDA), a physics-inspired algorithm that boosts stability and speeds convergence across a wide range of models, especially deep networks. Unlike traditional methods such as Stochastic Gradient Descent, which may converge slowly or become trapped in local minima, DF-GDA employs an adaptive, temperature-controlled schedule that balances global exploration with precise refinement. Its dynamic fractional-parameter update selectively optimizes model components, improving computational efficiency. The method excels on high-dimensional tasks, including image classification, and also strengthens simpler classical models by reducing local-minimum risk and increasing robustness to noisy data. Extensive experiments on sixteen large, interdisciplinary datasets, including image classification, natural language processing, healthcare, and biology, show that DF-GDA consistently outperforms both state-of-the-art and traditional optimizers in convergence speed and accuracy, offering a powerful alternative for critical large-scale, complex problems across diverse scientific and industrial settings today.
Recent grants
NIH · $141k · 2005
ABI Innovation: Towards the Neurome -- Automated Image Analysis for Neuroinformatics
NSF · $483k · 2011–2016
NIH · $847k · 2005
EAGER: Spatiotemporal Transformer for Activity Recognition
NSF · $281k · 2023–2026
Frequent coauthors
- 46 shared
Nilanjan Ray
- 34 shared
Tamal Batabyal
Massachusetts Institute of Technology
- 31 shared
Peter Youngs
- 28 shared
Klaus Ley
- 26 shared
Matthew Korban
University of Virginia
- 25 shared
John A. Hossack
University of Virginia
- 24 shared
Andrea Vaccari
- 22 shared
Zongli Lin
University of Virginia
Labs
Awards & honors
- IEEE Fellow 2013
- Faculty Innovation Award 2017
- All-University Teaching Award 2009
- Outstanding Young Electrical Engineer 1996
- Director’s Award for Superior Accomplishment, National Scien…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Scott T. Acton
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup