
Katherine Flanigan
· Assistant ProfessorVerifiedCarnegie Mellon University · Civil and Environmental Engineering
Active 1966–2026
About
Katherine Flanigan is an Assistant Professor in Civil and Environmental Engineering. Her research focuses on climate-resilient environmental systems and technologies, sustainable energy and transportation systems, and intelligent engineered systems and society. Her work involves developing innovative solutions to environmental challenges, emphasizing sustainability and resilience in engineering practices.
Research topics
- Computer Science
- Knowledge management
- Computer Security
- Data science
- Engineering
- Artificial Intelligence
- Business
- Systems engineering
- Human–computer interaction
- Structural engineering
- Reliability engineering
- Risk analysis (engineering)
- Management science
Selected publications
Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments
ArXiv.org · 2026-01-20
articleOpen accessAnticipating human activities and their durations is essential in applications such as smart-home automation, simulation-based architectural and urban design, activity-based transportation system simulation, and human-robot collaboration, where adaptive systems must respond to human activities. Existing data-driven agent-based models--from rule-based to deep learning--struggle in low-data environments, limiting their practicality. This paper investigates whether large language models, pre-trained on broad human knowledge, can fill this gap by reasoning about everyday activities from compact contextual cues. We adopt a retrieval-augmented prompting strategy that integrates four sources of context--temporal, spatial, behavioral history, and persona--and evaluate it on the CASAS Aruba smart-home dataset. The evaluation spans two complementary tasks: next-activity prediction with duration estimation, and multi-step daily sequence generation, each tested with various numbers of few-shot examples provided in the prompt. Analyzing few-shot effects reveals how much contextual supervision is sufficient to balance data efficiency and predictive accuracy, particularly in low-data environments. Results show that large language models exhibit strong inherent temporal understanding of human behavior: even in zero-shot settings, they produce coherent daily activity predictions, while adding one or two demonstrations further refines duration calibration and categorical accuracy. Beyond a few examples, performance saturates, indicating diminishing returns. Sequence-level evaluation confirms consistent temporal alignment across few-shot conditions. These findings suggest that pre-trained language models can serve as promising temporal reasoners, capturing both recurring routines and context-dependent behavioral variations, thereby strengthening the behavioral modules of agent-based models.
Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments
2026-03-09
articleHuman-centered simulation of intersection lighting: a parametric study of design tradeoffs
Internet of Things · 2026-04-01
articleOpen accessCorrespondingPedestrian safety in urban environments remains a critical public health issue, with nighttime conditions substantially increasing crash risk for vulnerable road users. Yet intersection lighting design remains constrained by roadway-centric standards and static workflows, offering limited support for evaluating tradeoffs among multidirectional pedestrian visibility, glare, and light trespass—and environmentally sensitive designs are often presumed, without evidence, to compromise safety. We introduce SALUSLux, an open-source, programmable simulation toolkit for pedestrian-centered intersection lighting analysis, and apply it to a parametric study of 2,304 configurations on a standard four-way intersection. Results reveal three findings with direct implications for practice and standards. First, spatial geometry and luminaire properties interact so strongly that they cannot be optimized independently—a luminaire that performs well in one configuration can fail in another, making joint design-space exploration essential. Second, semi-cylindrical illuminance was the most difficult metric to satisfy across all scenarios and should be elevated to a required standard for intersection crosswalks; horizontal illuminance—the dominant metric in current practice—provided little additional information once other criteria were met, and overreliance on it risks encouraging excessive lighting that increases glare without improving pedestrian safety. Third, warm-color 2,700K lighting fully satisfies all pedestrian visibility thresholds when paired with appropriate spatial configuration, directly contradicting the assumed safety-sustainability tradeoff. Together, these findings provide an evidence base for intersection-specific lighting criteria that existing tools cannot deliver.
Open MIND · 2026-01-23
preprintSocial infrastructure and other built environments are increasingly expected to support well-being and community resilience by enabling social interaction. Yet in civil and built-environment research, there is no consistent and privacy-preserving way to represent and measure socially meaningful interaction in these spaces, leaving studies to operationalize "interaction" differently across contexts and limiting practitioners' ability to evaluate whether design interventions are changing the forms of interaction that social capital theory predicts should matter. To address this field-level and methodological gap, we introduce the Dyadic User Engagement DataseT (DUET) dataset and an embedded kinesics recognition framework that operationalize Ekman and Friesen's kinesics taxonomy as a function-level interaction vocabulary aligned with social capital-relevant behaviors (e.g., reciprocity and attention coordination). DUET captures 12 dyadic interactions spanning all five kinesic functions-emblems, illustrators, affect displays, adaptors, and regulators-across four sensing modalities and three built-environment contexts, enabling privacy-preserving analysis of communicative intent through movement. Benchmarking six open-source, state-of-the-art human activity recognition models quantifies the difficulty of communicative-function recognition on DUET and highlights the limitations of ubiquitous monadic, action-level recognition when extended to dyadic, socially grounded interaction measurement. Building on DUET, our recognition framework infers communicative function directly from privacy-preserving skeletal motion without handcrafted action-to-function dictionaries; using a transfer-learning architecture, it reveals structured clustering of kinesic functions and a strong association between representation quality and classification performance while generalizing across subjects and contexts.
ArXiv.org · 2026-01-23
articleOpen accessSocial infrastructure and other built environments are increasingly expected to support well-being and community resilience by enabling social interaction. Yet in civil and built-environment research, there is no consistent and privacy-preserving way to represent and measure socially meaningful interaction in these spaces, leaving studies to operationalize "interaction" differently across contexts and limiting practitioners' ability to evaluate whether design interventions are changing the forms of interaction that social capital theory predicts should matter. To address this field-level and methodological gap, we introduce the Dyadic User Engagement DataseT (DUET) dataset and an embedded kinesics recognition framework that operationalize Ekman and Friesen's kinesics taxonomy as a function-level interaction vocabulary aligned with social capital-relevant behaviors (e.g., reciprocity and attention coordination). DUET captures 12 dyadic interactions spanning all five kinesic functions-emblems, illustrators, affect displays, adaptors, and regulators-across four sensing modalities and three built-environment contexts, enabling privacy-preserving analysis of communicative intent through movement. Benchmarking six open-source, state-of-the-art human activity recognition models quantifies the difficulty of communicative-function recognition on DUET and highlights the limitations of ubiquitous monadic, action-level recognition when extended to dyadic, socially grounded interaction measurement. Building on DUET, our recognition framework infers communicative function directly from privacy-preserving skeletal motion without handcrafted action-to-function dictionaries; using a transfer-learning architecture, it reveals structured clustering of kinesic functions and a strong association between representation quality and classification performance while generalizing across subjects and contexts.
Evaluating Few-Shot Temporal Reasoning of LLMs for Human Activity Prediction in Smart Environments
Open MIND · 2026-01-20
preprintAnticipating human activities and their durations is essential in applications such as smart-home automation, simulation-based architectural and urban design, activity-based transportation system simulation, and human-robot collaboration, where adaptive systems must respond to human activities. Existing data-driven agent-based models--from rule-based to deep learning--struggle in low-data environments, limiting their practicality. This paper investigates whether large language models, pre-trained on broad human knowledge, can fill this gap by reasoning about everyday activities from compact contextual cues. We adopt a retrieval-augmented prompting strategy that integrates four sources of context--temporal, spatial, behavioral history, and persona--and evaluate it on the CASAS Aruba smart-home dataset. The evaluation spans two complementary tasks: next-activity prediction with duration estimation, and multi-step daily sequence generation, each tested with various numbers of few-shot examples provided in the prompt. Analyzing few-shot effects reveals how much contextual supervision is sufficient to balance data efficiency and predictive accuracy, particularly in low-data environments. Results show that large language models exhibit strong inherent temporal understanding of human behavior: even in zero-shot settings, they produce coherent daily activity predictions, while adding one or two demonstrations further refines duration calibration and categorical accuracy. Beyond a few examples, performance saturates, indicating diminishing returns. Sequence-level evaluation confirms consistent temporal alignment across few-shot conditions. These findings suggest that pre-trained language models can serve as promising temporal reasoners, capturing both recurring routines and context-dependent behavioral variations, thereby strengthening the behavioral modules of agent-based models.
Bridging the Reality Gap in Digital Twins with Context-Aware, Physics-Guided Deep Learning
Journal of Computing in Civil Engineering · 2026-01-28 · 2 citations
articleOpen accessDigital twins (DTs) enable powerful predictive analytics, but persistent discrepancies between simulations and real systems—known as the reality gap—undermine their reliability. Coined in robotics, the term now applies to DTs, where discrepancies stem from context mismatches, cross domain interactions, and multiscale dynamics. Among these, context mismatch is pressing and underexplored, as DT accuracy depends on capturing operational context, often only partially observable. However, DTs have a key advantage: simulators can systematically vary contextual factors and explore scenarios difficult or impossible to observe empirically, informing inference and model alignment. While sim-to-real transfer like domain adaptation shows promise in robotics, its application to DTs poses two key challenges. First, unlike one-time policy transfers, DTs require continuous calibration across an asset’s lifecycle—demanding structured information flow, timely detection of out-of-sync states, and integration of historical and new data. Second, DTs often perform inverse modeling, inferring latent states or faults from observations that may reflect multiple evolving contexts. These needs strain purely data-driven models and risk violating physical consistency. Though some approaches preserve validity via a reduced-order model, most domain adaptation techniques still lack such constraints. To address this, we propose a reality gap analysis (RGA) module for DTs that continuously integrates new sensor data, detects misalignments, and recalibrates DTs via a query-response framework. Our approach fuses domain-adversarial deep learning with reduced-order simulator guidance to improve context inference and preserve physical consistency. We illustrate the RGA module in a structural health monitoring case study on a steel truss bridge in Pittsburgh, PA, showing faster calibration and better real-world alignment.
Computers Environment and Urban Systems · 2026-02-26
articleOpen accessCorrespondingPeople spend the majority of their lives within built environments, whose design can profoundly influence human- and community-centered outcomes such as social capital formation, access to opportunity, public health, and resilience to disruption. Just as the built environment shapes human behavior and well-being, its design, operation, and performance can be substantially improved by better understanding how people actually use and experience space. Yet both of these goals — enhancing human benefits from built environments and improving system performance through human-aware design — are constrained by a fundamental limitation: existing computational models oversimplify human agents, equipping them with static or assumed behavioral rules that fail to reflect the dynamic, adaptive, and context-sensitive nature of real-world behavior. These simplifications undermine generalizability, limiting the ability of such models to transfer insights across scenarios or support the design of responsive, human-centered spaces. To overcome these limitations, we introduce EMPIRE ( Empirical Modeling of People in Responsive Environments ) — a data-driven, hierarchical model for predicting human spatio-temporal behavior in dynamic physical environments, with a focus on scenario-based generalizability. Driven by in-situ data, EMPIRE integrates Imitation Learning for strategic activity planning and Reinforcement Learning for generating adaptive execution policies based on interpretation of the environment and preferences. This multi-layered decomposition mirrors the cognitive structure of human decision making, enabling modularity, interpretability, and adaptability across unseen spatial configurations. To illustrate EMPIRE’s generalizability, we simulate human behavior in a social infrastructure setting (i.e., a park) by generating synthetic ground-truth trajectories that incorporate heterogeneous agent preferences, environmental dynamics, and social constraints. We conduct a systematic evaluation across six distinct park layouts using a leave-one-layout-out strategy, where models are trained on five configurations and tested on the sixth. This setup allows assessment of EMPIRE’s capacity to generalize to various unseen spatial scenarios. Experimental results demonstrate that EMPIRE successfully transfers learned behavioral patterns to new environments. • Data-driven agent-based model learns activities and preferences from in-situ data. • Hierarchical IL-GNN-RL structure mirrors human cognition for behavior simulation. • GNN learns preference-based rewards from physical, environmental, and social features. • Modular, data-driven foundation for rapid what-if built environment analysis.
ArXiv.org · 2026-01-23
articleOpen accessSocial infrastructure and other built environments are increasingly expected to support well-being and community resilience by enabling social interaction. Yet in civil and built-environment research, there is no consistent and privacy-preserving way to represent and measure socially meaningful interaction in these spaces, leaving studies to operationalize "interaction" differently across contexts and limiting practitioners' ability to evaluate whether design interventions are changing the forms of interaction that social capital theory predicts should matter. To address this field-level and methodological gap, we introduce the Dyadic User Engagement DataseT (DUET) dataset and an embedded kinesics recognition framework that operationalize Ekman and Friesen's kinesics taxonomy as a function-level interaction vocabulary aligned with social capital-relevant behaviors (e.g., reciprocity and attention coordination). DUET captures 12 dyadic interactions spanning all five kinesic functions-emblems, illustrators, affect displays, adaptors, and regulators-across four sensing modalities and three built-environment contexts, enabling privacy-preserving analysis of communicative intent through movement. Benchmarking six open-source, state-of-the-art human activity recognition models quantifies the difficulty of communicative-function recognition on DUET and highlights the limitations of ubiquitous monadic, action-level recognition when extended to dyadic, socially grounded interaction measurement. Building on DUET, our recognition framework infers communicative function directly from privacy-preserving skeletal motion without handcrafted action-to-function dictionaries; using a transfer-learning architecture, it reveals structured clustering of kinesic functions and a strong association between representation quality and classification performance while generalizing across subjects and contexts.
2025-12-11
articleSenior authorCorrespondingCyber-physical-social infrastructure systems (CPSIS) account for human-centered interactions and benefits overlooked by traditional cyber-physical systems. This requires defining social benefits, measuring and interpreting human interactions with each other and with infrastructure in a privacy-preserving way, modeling these interactions for prediction, linking observed outcomes to social benefits, and operating and/or designing the physical environment to produce desired social outcomes. Within this feedback cycle, this paper specifically delves into recognizing dyadic human interactions using real-world data, which is the backbone to measuring and interpreting social behavior. This work addresses the existing need to enhance broader understanding of the deeper meanings and mutual responses inherent in human interactions. While RGB cameras have been informative for interaction recognition, privacy concerns arise. Depth sensors offer a privacy-conscious alternative. This study presents a taxonomy for analyzing dyadic interactions and compares five skeleton-based interaction recognition algorithms on a data set of 12 dyadic interactions. Unlike most data sets, these interactions—categorized into communication types like emblems and affect displays—offer insights into the cultural and emotional aspects of human interactions.
Frequent coauthors
- 43 shared
J. Moore
Carnegie Mellon University
- 42 shared
Sarah Christian
Stanford University
- 42 shared
Fethiye Ozis
Northern Arizona University
- 42 shared
Gerald Wang
Cornell University
- 17 shared
Jerome P. Lynch
Duke University
- 10 shared
Mario Bergés
Carnegie Mellon University
- 6 shared
Mohammed Ettouney
- 6 shared
Sizhe Ma
Labs
Civil and Environmental Engineering at Carnegie Mellon UniversityPI
Education
- 2014
B.S., Civil and Environmental Engineering
Princeton University
- 2016
Other, Civil Engineering
University of Michigan
- 2018
Other, Electrical and Computer Engineering
University of Michigan
- 2020
Ph.D., Civil Engineering Intelligent Systems
University of Michigan
Awards & honors
- NSF Graduate Research Fellow
- College of Engineering Richard and Eleanor Towner Prize for…
- K&L Gates Presidential Fellowship
- Eisenhower Transportation Fellowship
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Katherine Flanigan
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup