
Yun Raymond Fu
· Professor, Jointly Appointed with College of EngineeringVerifiedNortheastern University · Artificial Intelligence and Data Science
Active 2002–2025
About
Yun Raymond Fu is a professor in the Khoury College of Computer Sciences and the College of Engineering at Northeastern University, based in Boston. His research interests include machine learning, computational intelligence, big data mining, computer vision, pattern recognition, and cyber-physical systems. He has published extensively in leading journals, books, book chapters, and international conferences and workshops, and serves as an associate editor, chair, program committee member, and reviewer for many of these venues and publications. Fu is also a successful serial entrepreneur for technology commercialization, having founded and served as president of Giaran, a spinoff from Northeastern University that focused on neural network-based augmented reality and facial image processing technologies, which was acquired by Shiseido in 2017. He has received numerous awards, including seven young investigator awards from prestigious organizations such as NAE, ONR, ARO, IEEE, INNS, UIUC, and the Grainger Foundation, as well as nine best paper awards from IEEE, IAPR, SPIE, and SIAM. He holds major industrial research awards from companies like Google, Samsung, Zebra, Adobe, and Mathworks. Fu is a fellow of AAAS, IEEE, IAPR, OSA, SPIE, and AAIA; a Lifetime Distinguished Member of ACM; and a Lifetime Senior Member of AAAI and the Institute of Mathematical Statistics. He is also a member of the ACM Future of Computing Academy, Global Young Academy, AAAS, and INNS.
Research topics
- Artificial Intelligence
- Computer Science
- Computer vision
- Machine Learning
- Mathematics
- Telecommunications
- Data Mining
- Theoretical computer science
- Engineering
- Speech recognition
- Algorithm
Selected publications
Deconstructing Spatial Intelligence in Vision-Language Models
2025-11-05
preprintOpen accessVision-Language Models (VLMs) have achieved remarkable success but exhibit a fundamental deficiency in spatial intelligence, a critical capability for progress in embodied AI, autonomous driving, and spatially coherent generation. In response, the research community has produced an explosion of work dedicated to enhancing these models, but this rapid progress has resulted in a fragmented and disorganized landscape lacking a unified framework. This paper presents the first comprehensive survey to address this gap, uniquely providing a systematic review that spans the foundations of spatial intelligence in VLMs, root causes of spatial limitations, enhancement methodologies, evaluation protocols, and real-world applications. Specifically, we introduce a novel, intervention-based taxonomy that categorizes enhancement methodologies according to where spatial information is incorporated: (1) training-free prompting, (2) model-centric enhancements (training strategies, architectural modules, encoder improvements), (3) explicit 2D information injection, (4) 3D spatial enrichment, and (5) data-centric approaches. To further assess the true capabilities of current models, we conduct a rigorous empirical study evaluating 37 models across 9 representative benchmarks. Our results and analysis reveal the state-of-the-art, identify the strengths and weaknesses of different methods, and uncover critical limitations in existing evaluation protocols. By structuring this rapidly evolving field and establishing a clear research agenda, this survey serves as an indispensable resource for advancing the next generation of spatially intelligent AI systems.
Towards Open-set Face Anti-spoofing with Unseen Attack Synthesis
2025-05-26
articleSenior authorExisting face anti-spoofing (FAS) methods have primarily focused on close-set or cross-domain settings with a few pre-defined presentation attacks (PAs). However, with the continual emergence of diverse PAs, we argue that developing a generalizable FAS detector for unseen PAs deserves more attention from the FAS community. In this work, we investigate the open-set FAS setting, where the spoof types in testing are unseen. The key challenge is that the unseen PAs are designed to deceive the FAS model and thus appear similar to live faces both at the pixel level and in the feature space. To address this issue, we propose a novel framework for synthesizing unseen PAs and pushing the generated samples toward an open category space. Our approach is motivated by empirical findings that unseen PAs are more likely to be compactly clustered by spoof type and located at the boundary of the live distribution in the spoof-type-aware feature space derived from multi-class optimization. Lastly, we evaluate our method on the SiW-Mv2 cross-type benchmark using both fine-grained and coarse-grained protocols. Compared to the baselines and existing top competitors in close-set or cross-domain settings, our method outperforms them significantly on both protocols.
Trajectory Prediction Meets Large Language Models: A Survey
ArXiv.org · 2025-06-03 · 1 citations
preprintOpen accessSenior authorRecent advances in large language models (LLMs) have sparked growing interest in integrating language-driven techniques into trajectory prediction. By leveraging their semantic and reasoning capabilities, LLMs are reshaping how autonomous systems perceive, model, and predict trajectories. This survey provides a comprehensive overview of this emerging field, categorizing recent work into five directions: (1) Trajectory prediction via language modeling paradigms, (2) Direct trajectory prediction with pretrained language models, (3) Language-guided scene understanding for trajectory prediction, (4) Language-driven data generation for trajectory prediction, (5) Language-based reasoning and interpretability for trajectory prediction. For each, we analyze representative methods, highlight core design choices, and identify open challenges. This survey bridges natural language processing and trajectory prediction, offering a unified perspective on how language can enrich trajectory prediction.
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
2025-10-19
articleOpen accessSenior authorWe present a novel perspective on learning video embedders for generative modeling: rather than requiring an exact reproduction of an input video, an effective embedder should focus on synthesizing visually plausible reconstructions. This relaxed criterion enables substantial improvements in compression ratios without compromising the quality of downstream generative models. Specifically, we propose replacing the conventional encoder-decoder video embedder with an encoder-generator framework that employs a diffusion transformer (DiT) to synthesize missing details from a compact latent space. Therein, we develop a dedicated latent conditioning module to condition the DiT decoder on the encoded video latent embedding. Our experiments demonstrate that our approach enables superior encoding-decoding performance compared to state-of-the-art methods, particularly as the compression ratio increases. To demonstrate the efficacy of our approach, we report results from our video embedders achieving a temporal compression ratio of up to 32x (8x higher than leading video embedders) and validate the robustness of this ultra-compact latent space for text-to-video generation, providing a significant efficiency boost in latent diffusion model training and inference.
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2025-10-31
articleSenior authorPredicting trajectories is essential for interpreting human behavior, yet it remains a challenging task when relying solely on observed motion patterns. Despite substantial progress, most existing methods assume fully observed trajectories and fail to account for missing data caused by occlusion, limited field of view, or sensor failures. This limitation substantially compromises the reliability of trajectory prediction, particularly in real-world deployment where observations are often incomplete. In light of this issue, our work presents the Gaussian Mixture Conditional Variational Recurrent Neural Network (GMC-VRNN), which unifies trajectory imputation and prediction within a single framework. Our GMC-VRNN framework couples a Multi-Space Graph Neural Network (MS-GNN) with a Gaussian Mixture Conditional VRNN, further augmented by a Bidirectional Temporal Decay (BTD) module, to achieve robust spatio-temporal representation learning under incomplete observations. To verify its effectiveness, we conduct extensive evaluations on two sports datasets covering multiple scenarios, jointly tackling trajectory imputation and prediction. Our experiments confirm that GMC-VRNN surpasses recent state-of-the-art approaches, offering enhanced precision and stronger robustness under diverse conditions.
EmoGene: Audio-Driven Emotional 3D Talking-Head Generation
2025-05-26 · 1 citations
articleSenior authorAudio-driven talking-head generation is a crucial and useful technology for virtual human interaction and filmmaking. While recent advances have focused on improving image fidelity and lip synchronization, generating accurate emotional expressions remains underexplored. In this paper, we introduce EmoGene, a novel framework for synthesizing highfidelity, audio-driven video portraits with accurate emotional expressions. Our approach employs a variational autoencoder (VAE)-based audio-to-motion module to generate facial landmarks, which are concatenated with emotional embedding in a motion-to-emotion module to produce emotional landmarks. These landmarks drive a Neural Radiance Fields (NeRF)based emotion-to-video module to render realistic emotional talking-head videos. Additionally, we propose a pose sampling method to generate natural idle-state (non-speaking) videos for silent audio inputs. Extensive experiments demonstrate that EmoGene outperforms previous methods in generating highfidelity emotional talking-head videos.
AdaSports-Traj: Role- and Domain-Aware Adaptation for Multi-Agent Trajectory Modeling in Sports
ArXiv.org · 2025-09-19
preprintOpen accessSenior authorTrajectory prediction in multi-agent sports scenarios is inherently challenging due to the structural heterogeneity across agent roles (e.g., players vs. ball) and dynamic distribution gaps across different sports domains. Existing unified frameworks often fail to capture these structured distributional shifts, resulting in suboptimal generalization across roles and domains. We propose AdaSports-Traj, an adaptive trajectory modeling framework that explicitly addresses both intra-domain and inter-domain distribution discrepancies in sports. At its core, AdaSports-Traj incorporates a Role- and Domain-Aware Adapter to conditionally adjust latent representations based on agent identity and domain context. Additionally, we introduce a Hierarchical Contrastive Learning objective, which separately supervises role-sensitive and domain-aware representations to encourage disentangled latent structures without introducing optimization conflict. Experiments on three diverse sports datasets, Basketball-U, Football-U, and Soccer-U, demonstrate the effectiveness of our adaptive design, achieving strong performance in both unified and cross-domain trajectory prediction settings.
Scientific Reports · 2025-04-24 · 1 citations
articleOpen access1st authorCorrespondingOsteoarthritis is a widespread chronic joint disease, becoming increasingly prevalent, particularly among individuals over the age of 45. This condition causes joint pain and dysfunction, significantly disrupting daily life. The objective of this study is to develop an optimal machine learning model for predicting the risk of osteoarthritis in individuals aged 45 and older. This study utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, which included a total of 2980 individuals. The dataset was randomly divided into a training set (n = 2235) and a validation set (n = 745). Five machine learning algorithms were employed to develop the predictive model for osteoarthritis. The SHapley Additive exPlanation (SHAP) method was used to interpret the machine learning algorithms and identify the most significant features for predicting outcomes. The study involved 2980 participants and focused on predicting the probability of osteoarthritis occurrence using machine learning algorithms. Five algorithms were employed, analyzing 24 features from an average 60-year-old cohort, with 605 osteoarthritis diagnoses. After performing Recursive Feature Elimination (RFE) to select 20 features, the CatBoost model achieved an AUC of 0.8109 and an accuracy rate of 0.7315, making it the most efficient model. The most influential factors in the predictions were Gender, Age, BMI, Waist Circumference, and Race. This study demonstrates that the CatBoost model with 20 features can effectively predict the occurrence of osteoarthritis. This accurate prediction model can help inform early interventions and patient management strategies, potentially improving patient prognosis. Further research will focus on enhancing the model performance, such as incorporating additional relevant features or refining existing ones. Additionally, validating the model in more diverse patient populations, and investigating its potential for real-time implementation in clinical settings would further increase the study's impact and facilitate its translation into clinical practice.
Corrosion-sensitive structures stored in the marine environment liquid film prediction method
2025-07-08
articleCorrosion of metal structures in marine environments is a major industrial challenge, especially in corrosion-sensitive areas such as gaps, sharp corners, and grooves, where capillary effects facilitate the formation of condensation films. This paper proposes a novel method for predicting liquid film formation in such structures, incorporating factors like capillary condensation and non-condensable gases. A model was developed to simulate the condensation behavior on metal bonded plates in marine environments, allowing for the prediction of liquid film accumulation under varying conditions. The study presents data on the water content of condensed liquid on metal surfaces, highlighting how different environmental factors influence corrosion risks. The findings offer a new approach for understanding metal corrosion and improving its prevention in marine atmospheric conditions.
LightAvatar: Efficient Head Avatar as Dynamic Neural Light Field
Lecture notes in computer science · 2025-01-01 · 3 citations
book-chapterSenior author
Recent grants
NSF · $1.2M · 2011–2013
NSF · $1.0M · 2012–2016
EAGER: Vision-Based Activity Forecasting by Mining Temporal Causalities
NSF · $180k · 2016–2019
Frequent coauthors
- 131 shared
Zhengming Ding
Tulane University
- 78 shared
Ming Shao
Southeast University
- 65 shared
Yulun Zhang
- 54 shared
Can Qin
- 53 shared
Sheng Li
Jiangsu Province Hospital
- 47 shared
Lichen Wang
Tianjin University
- 44 shared
Gan Sun
- 42 shared
Yu Kong
Center for Excellence in Brain Science and Intelligence Technology
Awards & honors
- Seven young investigator awards from NAE, ONR, ARO, IEEE, IN…
- Nine best paper awards from IEEE, IAPR, SPIE, and SIAM
- Fellow of AAAS
- Fellow of IEEE
- Fellow of IAPR
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yun Raymond Fu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup