Jon E. Froehlich
· ProfessorVerifiedUniversity of Washington · Computer Science & Engineering
Active 1967–2026
About
Jon E. Froehlich is the Lab Director of the Makeability Lab at the University of Washington's Computer Science department. He earned his PhD in 2011 from the University of Washington with a dissertation titled 'Sensing and Feedback of Everyday Activities to Promote Environmental Behaviors.' His research focuses on designing, building, and evaluating new interactive tools and techniques to address pressing societal challenges. The Makeability Lab emphasizes both technological innovations that enable new human abilities and an educational mission to help students gain new skills through research, invention, and human-centered design.
Research topics
- Mathematics
- Engineering
- Computer Science
- Mathematics education
- World Wide Web
- Epistemology
- Psychology
- Multimedia
- Human–computer interaction
- Geography
Selected publications
CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
Open MIND · 2026-02-20
preprintSenior authorVision-Language Models (VLMs) have shown remarkable progress in Vision-Language Navigation (VLN), offering new possibilities for navigation decision-making that could benefit both robotic platforms and human users. However, real-world navigation is inherently conditioned by the agent's mobility constraints. For example, a sweeping robot cannot traverse stairs, while a quadruped can. We introduce Capability-Conditioned Navigation (CapNav), a benchmark designed to evaluate how well VLMs can navigate complex indoor spaces given an agent's specific physical and operational capabilities. CapNav defines five representative human and robot agents, each described with physical dimensions, mobility capabilities, and environmental interaction abilities. CapNav provides 45 real-world indoor scenes, 473 navigation tasks, and 2365 QA pairs to test if VLMs can traverse indoor environments based on agent capabilities. We evaluate 13 modern VLMs and find that current VLM's navigation performance drops sharply as mobility constraints tighten, and that even state-of-the-art models struggle with obstacle types that require reasoning on spatial dimensions. We conclude by discussing the implications for capability-aware navigation and the opportunities for advancing embodied spatial reasoning in future VLMs. The benchmark is available at https://github.com/makeabilitylab/CapNav
CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation
arXiv (Cornell University) · 2026-02-20
articleOpen accessSenior authorVision-Language Models (VLMs) have shown remarkable progress in Vision-Language Navigation (VLN), offering new possibilities for navigation decision-making that could benefit both robotic platforms and human users. However, real-world navigation is inherently conditioned by the agent's mobility constraints. For example, a sweeping robot cannot traverse stairs, while a quadruped can. We introduce Capability-Conditioned Navigation (CapNav), a benchmark designed to evaluate how well VLMs can navigate complex indoor spaces given an agent's specific physical and operational capabilities. CapNav defines five representative human and robot agents, each described with physical dimensions, mobility capabilities, and environmental interaction abilities. CapNav provides 45 real-world indoor scenes, 473 navigation tasks, and 2365 QA pairs to test if VLMs can traverse indoor environments based on agent capabilities. We evaluate 13 modern VLMs and find that current VLM's navigation performance drops sharply as mobility constraints tighten, and that even state-of-the-art models struggle with obstacle types that require reasoning on spatial dimensions. We conclude by discussing the implications for capability-aware navigation and the opportunities for advancing embodied spatial reasoning in future VLMs. The benchmark is available at https://github.com/makeabilitylab/CapNav
arXiv (Cornell University) · 2026-03-27
preprintOpen accessPublic art can hold cultural, social, political, and aesthetic significance, enriching urban environments and promoting well-being. However, a majority of urban art is inaccessible to blind and low vision (BLV) people. Most art access research has focused on private and curated settings (e.g., museums, galleries) and most urban access work has centered on outdoor navigation, leaving urban and public art accessibility largely understudied. We conducted semi-structured interviews with 16 BLV participants, using design probes featuring AI-generated descriptions and real-time AI interactions to investigate preferences for both discovering and engaging with urban art. We found that BLV people valued spontaneous art exploration, multisensory (e.g., tactile, auditory, olfactory) engagement, and detailed descriptions of culturally significant artwork. Participants also highlighted challenges distinct to urban art contexts: safety took precedence over art exploration, multisensory access measures could be disruptive to others in the public space, and inaccurate AI descriptions could lead to cultural erasure. Our contributions include empirical insights on BLV preferences for urban art discovery and engagement, seven design dimensions for public art access solutions, and implications for expanding HCI urban accessibility research beyond navigation.
ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation
Open MIND · 2026-02-04
preprintOnline egocentric gaze estimation predicts where a camera wearer is looking from first-person video using only past and current frames, a task essential for augmented reality and assistive technologies. Unlike third-person gaze estimation, this setting lacks explicit head or eye signals, requiring models to infer current visual attention from sparse, indirect cues such as hand-object interactions and salient scene content. We observe that gaze exhibits strong temporal continuity during goal-directed activities: knowing where a person looked recently provides a powerful prior for predicting where they look next. Inspired by vision-conditioned autoregressive decoding in vision-language models, we propose ARGaze, which reformulates gaze estimation as sequential prediction: at each timestep, a transformer decoder predicts current gaze by conditioning on (i) current visual features and (ii) a fixed-length Gaze Context Window of recent gaze target estimates. This design enforces causality and enables bounded-resource streaming inference. We achieve state-of-the-art performance across multiple egocentric benchmarks under online evaluation, with extensive ablations validating that autoregressive modeling with bounded gaze history is critical for robust prediction. We will release our source code and pre-trained models.
GeoVisA11y: An AI-based Geovisualization Question-Answering System for Screen-Reader Users
Open MIND · 2026-03-08
preprintSenior authorGeovisualizations are powerful tools for communicating spatial information, but are inaccessible to screen-reader users. To address this limitation, we present GeoVisA11y, an LLM-based question-answering system that makes geovisualizations accessible through natural language interaction. The system supports map reading, analysis, interpretation and navigation by handling analytical, geospatial, visual and contextual queries. Through user studies with 12 screen-reader users and sighted participants, we demonstrate that GeoVisA11y effectively bridges accessibility gaps while revealing distinct interaction patterns between user groups. We contribute: (1) an open-source, accessible geovisualization system, (2) empirical findings on query and navigation differences, and (3) a dataset of geospatial queries to inform future research on accessible data visualization.
ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation
ArXiv.org · 2026-02-04
articleOpen accessOnline egocentric gaze estimation predicts where a camera wearer is looking from first-person video using only past and current frames, a task essential for augmented reality and assistive technologies. Unlike third-person gaze estimation, this setting lacks explicit head or eye signals, requiring models to infer current visual attention from sparse, indirect cues such as hand-object interactions and salient scene content. We observe that gaze exhibits strong temporal continuity during goal-directed activities: knowing where a person looked recently provides a powerful prior for predicting where they look next. Inspired by vision-conditioned autoregressive decoding in vision-language models, we propose ARGaze, which reformulates gaze estimation as sequential prediction: at each timestep, a transformer decoder predicts current gaze by conditioning on (i) current visual features and (ii) a fixed-length Gaze Context Window of recent gaze target estimates. This design enforces causality and enables bounded-resource streaming inference. We achieve state-of-the-art performance across multiple egocentric benchmarks under online evaluation, with extensive ablations validating that autoregressive modeling with bounded gaze history is critical for robust prediction. We will release our source code and pre-trained models.
2026-04-13 · 1 citations
articleOpen accessSenior authorUrban cycling benefits personal wellbeing, public health, and global sustainability. While current tools such as Google and Apple Maps provide bike route recommendations, they do not account for a person’s dynamic context (e.g., commuting, recreation). We introduce BikeButler, a personalized, context-sensitive bicycle route generation tool that enables users to generate, compare, virtually preview, and iteratively customize bike routes via custom profiles that encode seven bikeability features, including bike lane existence, slope, vegetation, and surface quality—fusing data from OpenStreetMap, open government data, and a custom VLM-based analysis of Street View images. To design BikeButler, we employed a human-centered, iterative approach starting with formative interviews and culminating in a user study (N=16). Our findings demonstrate that bike routing preferences change as a function of context, that BikeButler enables users to quickly create and iterate context-sensitive routes, and that generated routes differ significantly from Google Maps bike routing, reinforcing the importance of personalization.
GeoVisA11y: An AI-based Geovisualization Question-Answering System for Screen-Reader Users
ArXiv.org · 2026-03-08
articleOpen accessSenior authorGeovisualizations are powerful tools for communicating spatial information, but are inaccessible to screen-reader users. To address this limitation, we present GeoVisA11y, an LLM-based question-answering system that makes geovisualizations accessible through natural language interaction. The system supports map reading, analysis, interpretation and navigation by handling analytical, geospatial, visual and contextual queries. Through user studies with 12 screen-reader users and sighted participants, we demonstrate that GeoVisA11y effectively bridges accessibility gaps while revealing distinct interaction patterns between user groups. We contribute: (1) an open-source, accessible geovisualization system, (2) empirical findings on query and navigation differences, and (3) a dataset of geospatial queries to inform future research on accessible data visualization.
arXiv (Cornell University) · 2026-03-27
articleOpen accessPublic art can hold cultural, social, political, and aesthetic significance, enriching urban environments and promoting well-being. However, a majority of urban art is inaccessible to blind and low vision (BLV) people. Most art access research has focused on private and curated settings (e.g., museums, galleries) and most urban access work has centered on outdoor navigation, leaving urban and public art accessibility largely understudied. We conducted semi-structured interviews with 16 BLV participants, using design probes featuring AI-generated descriptions and real-time AI interactions to investigate preferences for both discovering and engaging with urban art. We found that BLV people valued spontaneous art exploration, multisensory (e.g., tactile, auditory, olfactory) engagement, and detailed descriptions of culturally significant artwork. Participants also highlighted challenges distinct to urban art contexts: safety took precedence over art exploration, multisensory access measures could be disruptive to others in the public space, and inaccurate AI descriptions could lead to cultural erasure. Our contributions include empirical insights on BLV preferences for urban art discovery and engagement, seven design dimensions for public art access solutions, and implications for expanding HCI urban accessibility research beyond navigation.
ArXiv.org · 2026-02-09
articleOpen accessProject Sidewalk is a web-based platform that enables crowdsourcing accessibility of sidewalks at city-scale by virtually walking through city streets using Google Street View. The tool has been used in 40 cities across the world, including the US, Mexico, Chile, and Europe. In this paper, we describe adaptation efforts to enable deployment in Chandigarh, India, including modifying annotation types, provided examples, and integrating VLM-based mission guidance, which adapts instructions based on a street scene and metadata analysis. Our evaluation with 3 annotators indicates the utility of AI-mission guidance with an average score of 4.66. Using this adapted Project Sidewalk tool, we conduct a Points of Interest (POI)-centric accessibility analysis for three sectors in Chandigarh with very different land uses, residential, commercial and institutional covering about 40 km of sidewalks. Across 40 km of roads audited in three sectors and around 230 POIs, we identified 1,644 of 2,913 locations where infrastructure improvements could enhance accessibility.
Recent grants
NSF · $550k · 2014–2019
HCC: Medium: Combining Crowdsourcing and Computer Vision for Street-level Accessibility
NSF · $1.2M · 2013–2019
CAREER: A Tangible-Graphical Approach to Engage Young Children in Wearable Design
NSF · $541k · 2017–2024
NSF · $2.0M · 2021–2026
CAREER: A Tangible-Graphical Approach to Engage Young Children in Wearable Design
NSF · $227k · 2017–2018
Frequent coauthors
- 72 shared
Leah Findlater
Apple (United States)
- 24 shared
Liang He
- 20 shared
Lee Stearns
University of Maryland, College Park
- 19 shared
Dhruv Jain
- 17 shared
James A. Landay
Stanford University
- 17 shared
Michael Saugstad
University of Washington
- 16 shared
Manaswi Saha
Accenture (Switzerland)
- 14 shared
Uran Oh
Ewha Womans University
Labs
An advanced research lab in Human-Computer Interaction and AI
Education
- 2009
Ph.D., Computer Science
University of Washington
- 2004
M.S., Computer Science
University of Washington
- 2002
B.S., Computer Science and Engineering
University of California, San Diego
Awards & honors
- 21 Best Paper and Honorable Mention awards
- Sloan Fellowship
- UW Distinguished Dissertation Award
- Google Faculty Research Awards
- UW College of Engineering Outstanding Faculty Award (2021)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jon E. Froehlich
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup