
Stephen Guy
· Associate ProfessorVerifiedUniversity of Minnesota · Computer Science and Engineering
Active 1998–2026
About
Stephen Guy is an Associate Professor in the Department of Computer Science & Engineering at the University of Minnesota Twin Cities. He joined the department as an assistant professor in 2012 and was promoted to associate professor in 2018. He also serves as the director of graduate studies. His educational background includes a B.S. in Computer Engineering from the University of Virginia (2006), and both an M.S. (2009) and a Ph.D. (2012) in Computer Science from the University of North Carolina at Chapel Hill. His research focuses on motion planning, predictive simulations, and human behavioral analysis, with particular emphasis on approaches that leverage large-scale data analysis, optimization, and machine learning. He is interested in cross-disciplinary applications of his work to domains such as robotics, video games, virtual reality, and medicine. Stephen Guy has received recognition for his teaching, including the Charles E. Bowers Faculty Teaching Award in 2018, and is a member of professional organizations such as ACM, IEEE, the IEEE Computer Society, and the IEEE Robotics and Automation Society.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Data Mining
- Statistics
- Human–computer interaction
- Mathematics
- Distributed computing
- Real-time computing
- Algorithm
Selected publications
Least-effort trajectories lead to emergent crowd behaviors
UNC Libraries · 2026-04-03
articleOpen accessPedestrian crowds often have been modeled as many-particle systems, usually using computer models known as multiagent simulations. The key challenge in modeling crowds is to develop rules that guide how the particles or agents interact with each other in a way that faithfully reproduces paths and behaviors commonly seen in real human crowds. Here, we propose a simple and intuitive formulation of these rules based on biomechanical measurements and the principle of least effort. We present a constrained optimization method to compute collision-free paths of minimum caloric energy for each agent, from which collective crowd behaviors can be reproduced. We show that our method reproduces common crowd phenomena, such as arching and self-organization into lanes. We also validate the flow rates and paths produced by our method and compare them to those of real-world crowd trajectories.
Design Principles Guide Meaningful Play by Improving Ease and Intentionality of Game Design
International Journal of Social Science Research · 2026-03-15
articleOpen accessThis project examines the role of design principles and how they impact the experience of players using a digital minigame created and inspired by the CLUE board game. We examine the role of design principles in guiding users toward finding hidden clues and whether or not users are drawn to spots with design principles predetermined and embedded. We investigate this by examining if pre-selected spots with design principles are perceived as more intentional than a baseline of spots identified by random points, mediated through the GroundingDINO object detector. Results showed that players were twice as likely to find design principles spots than randomly selected spots. Players found the design principles spots more intuitive and perceived intentional than the random baseline. In addition, results showed that there were varying effects from different design principles and can improve game design and allow designers to control the difficulty of the game experience.
Psychometrika · 2026-04-22
articleOpen accessThis article develops an analysis pipeline for quantifying and relating mouth shape variation to the emotions perceived from facial expressions. We use open-source data that contains ratings from 802 fairgoers on 27 smile-like expressions. Each rater was given a list of seven emotions (happy, sad, anger, contempt, fear, surprise, and disgust) and asked to select all of the words that best described the facial expression. To develop a generalizable method for quantifying mouth shape variation, we leverage statistical shape analysis techniques to parameterize each mouth's shape in terms of 30 systematically placed landmarks that outline the upper and lower lips. Furthermore, we demonstrate that a three-dimensional representation of these landmark coordinates produces an interpretable feature set that outperforms the original and full-dimensional feature sets in terms of predictive performance. To connect the mouth shape features to the emotion ratings, we develop a nonparametric multinomial regression model that is capable of shrinkage and selection with high-dimensional predictors. Our results demonstrate that the proposed method can produce easily interpretable model predictions that enhance our understanding of the nature in which subtle variations in mouth shape affect the perception of a facial expression.
Simultaneous Localization and Affordance Prediction of Tasks from Egocentric Video
2025-05-19
articleSenior authorVision-Language Models (VLMs) have shown great success as foundational models for downstream vision and natural language applications in a variety of domains. However, these models are limited to reasoning over objects and actions currently visible on the image plane. We present a spatial extension to the VLM, which leverages spatially-localized egocentric video demonstrations to augment VLMs in two ways - through understanding spatial task-affordances, i.e. where an agent must be for the task to physically take place, and the localization of that task relative to the egocentric viewer. We show our approach outperforms the baseline of using a VLM to map similarity of a task's description over a set of location-tagged images. Our approach has less error both on predicting where a task may take place and on predicting what tasks are likely to happen at the current location. The resulting representation will enable robots to use egocentric sensing to navigate to, or around, physical regions of interest for novel tasks specified in natural language.
bioRxiv (Cold Spring Harbor Laboratory) · 2025-08-07
preprintOpen accessAbstract Assessment of reaching is foundational to upper limb neurorehabilitation. Current neurorehabilitation needs have increased the demand for quantitative clinical assessments of bilateral coordination. Robotics and computer vision for motion tracking are two means to provide relevant quantitative metrics but have many differences including the dimensionality of reaching movements (planar versus three-dimensional) and data acquisition. We do not know how consistent measures of bilateral coordination performance are between these different assessments. In this study, we examined how one robotic and one computer vision method can identify differences between symmetrical and asymmetrical reaching, and the correlations in movement time, and hand lag between these two approaches. Thirty healthy young adults completed four reaching games using the Kinarm exoskeleton robot and a custom developed augmented reality assessment using computer vision. We found that both approaches were able to detect well-established movement time and hand lag differences between symmetrical and asymmetrical reaching, with the differences between symmetrical and asymmetrical being larger with the computer vision approach. Moderate correlations were found between approaches for unilateral and symmetric reaching in both movement time and hand lags; however, no significant correlations were found between approaches for asymmetric reaching. Our results show that reaching task performance differs between robotic and computer vision-based assessment, however, both approaches provide quantitative metrics of unilateral and bilateral reaching that are consistent with prior research. There are benefits and tradeoffs to each approach, and this study informs how clinicians and researchers can consider the methodological differences when determining which assessment method to use.
Smart Health · 2025-10-01
articleReaching Motion Characterization Across Childhood via Augmented Reality Games
ArXiv.org · 2025-02-20
preprintOpen accessSenior authorWhile performance in coordinated motor tasks has been shown to improve in children as they age, the characterization of children's movement strategies has been underexplored. In this work, we use upper-body motion data collected from an augmented reality reaching game, and show that short (13 second) sections of motion are are sufficient to reveal arm motion differences across child development. To explore what drives this trend, we characterize the movement patterns across different age groups by analyzing (1) directness of path, (2) maximum speed, and (3) progress towards the reaching target. We find that although maximum arm velocity decreases with age (p~=~0.02), their paths to goal are more direct (p~=~0.03), allowing for faster time to goal overall. We also find that older children exhibit more anticipatory reaching behavior, enabling more accurate goal-reaching (i.e. no overshooting) compared to younger children. The resulting analysis has potential to improve the realism of child-like digital characters and advance our understanding of motor skill development.
Restorative Neurology and Neuroscience · 2025-12-22
articleOpen accessSenior authorClinical assessments of the post-stroke upper limbs have several limitations in that they focus primarily on unilateral movements, rely on observer-based ordinal scales, and give limited insight into movement quality. Human pose estimation uses computer vision to extract motion data from videos, making it a clinically feasible tool to assess movement and overcome many challenges of traditional clinical assessments. Our objective of this work was to demonstrate the use of video-based pose estimation to enhance the assessment of bilateral tasks in individuals post-stroke through visualizations and quantitative metrics. Using single camera video recordings of the Chedoke Hand and Arm Activity Inventory in two individuals with chronic stroke and one neurologically intact individual, we demonstrate differences in movement patterns including increased compensatory movements of proximal joints and asymmetries. We were able to detect differences that the traditional assessment scoring could not, demonstrating the potential of computer vision to enhance clinical assessment.
SENT Map -- Semantically Enhanced Topological Maps with Foundation Models
ArXiv.org · 2025-11-05
preprintOpen accessWe introduce SENT-Map, a semantically enhanced topological map for representing indoor environments, designed to support autonomous navigation and manipulation by leveraging advancements in foundational models (FMs). Through representing the environment in a JSON text format, we enable semantic information to be added and edited in a format that both humans and FMs understand, while grounding the robot to existing nodes during planning to avoid infeasible states during deployment. Our proposed framework employs a two stage approach, first mapping the environment alongside an operator with a Vision-FM, then using the SENT-Map representation alongside a natural-language query within an FM for planning. Our experimental results show that semantic-enhancement enables even small locally-deployable FMs to successfully plan over indoor environments.
Characterizing masticatory motion of dogs using optical and electromagnetic motion tracking
Frontiers in Veterinary Science · 2025-07-03
articleOpen accessIntroduction Accurate knowledge of masticatory motion across a variety of food materials is essential for ex-vivo testing and simulation of the food-teeth interaction. Yet, the masticatory motion has never been fully characterized in the domestic dog ( Canis lupus ), limiting our ability for ex-vivo modelling. Objective The aim of this study was to characterize masticatory motion among a variety of different foods in beagle dogs using optical and electromagnetic motion tracking. Results We confirmed that the masticatory pattern in the beagle is a hinge motion with no clinically meaningful horizontal motion of the mandible. The mouth opening was not significantly difference among different food and treat types regardless of food stiffness and force to fracture of the food, with a mean and standard deviation of 2.51 ± 0.33 (range 1.93–2.95) cm between the canine teeth during chewing. Conversely, frequency of chewing was influenced by food type, with kibbles having a significantly higher peak mean chewing frequency (2.93 Hz) compared to other feeds. Frequency of chewing was linearly correlated to the force to fracture of the food material ( p = 0.03, R 2 = 0.56), while stiffness of food did not significantly affect peak chewing frequency. Conclusion Data from this study can guide ex-vivo modelling of the feed-teeth interaction for product design and testing, especially those that focus on prevention of periodontal disease and dentoalveolar trauma.
Recent grants
EAGER: Uncertainty-aware Planning for Robot Navigation in Human Environments
NSF · $170k · 2017–2020
Frequent coauthors
- 44 shared
Dinesh Manocha
University of Maryland, College Park
- 31 shared
Ming C. Lin
- 25 shared
Ioannis Karamouzas
University of California, Riverside
- 17 shared
Nick Sohre
University of Minnesota System
- 14 shared
Jur van den Berg
- 12 shared
Sujeong Kim
- 11 shared
Sofía Lyford-Pike
University of Minnesota
- 11 shared
Nathaniel E. Helwig
University of Minnesota
Labs
Applied Motion LabPI
Awards & honors
- 2018: College of Science and Engineering - Charles E. Bowers…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Stephen Guy
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup