
Bo Ji
· Assistant ProfessorVerifiedVirginia Tech · Computer Science
Active 2003–2026
About
Bo Ji is an Associate Professor in the Department of Computer Science at Virginia Tech. His research interests include networking, machine learning, security and privacy, and augmented/virtual reality. He holds a Ph.D. in electrical and computer engineering from The Ohio State University, obtained in 2012, and has earned master's and bachelor's degrees in information science and electronic engineering from Zhejiang University, China, in 2006 and 2004 respectively. His professional contact information includes an email address (boji@vt.edu) and a phone number (540-231-0331). His office is located at Gilbert Place, RM 4203, 220 Gilbert St., Blacksburg, VA. His research focuses on advancing knowledge in his areas of expertise through academic and applied projects.
Research topics
- Computer Science
- Artificial Intelligence
- Distributed computing
- Machine Learning
- Mathematical optimization
- Algorithm
- Computer network
- Parallel computing
- Embedded system
- Mathematics
Selected publications
Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices
2026-03-06
articleOpen accessReal-time multi-label video classification on embedded devices is constrained by limited compute and energy budgets. Yet, video streams exhibit structural properties such as label sparsity, temporal continuity, and label co-occurrence that can be leveraged for more efficient inference. We introduce Polymorph, a context-aware framework that activates a minimal set of lightweight Low Rank Adapters (LoRA) per frame. Each adapter specializes in a subset of classes derived from co-occurrence patterns and is implemented as a LoRA weight over a shared backbone. At runtime, Polymorph dynamically selects and composes only the adapters needed to cover the active labels, avoiding fullmodel switching and weight merging. This modular strategy improves scalability while reducing latency, and energy overhead. Polymorph achieves 40% lower energy consumption and improves mAP by 9 points over strong baselines executing the TAO dataset.
2025-03-08 · 4 citations
articleMixed Reality (MR) devices are being increasingly adopted across a wide range of real-world applications, ranging from education and healthcare to remote work and entertainment. However, the unique immersive features of MR devices, such as 3D spatial interactions and the encapsulation of virtual objects by invisible elements, introduce new vulnerabilities leading to interaction obstruction and misdirection. We implemented latency, click redirection, object occlusion, and spatial occlusion attacks within a remote collaborative MR platform using the Microsoft HoloLens 2 and evaluated user behavior and mitigations through a user study. We compared responses to MR-specific attacks, which exploit the unique characteristics of remote collaborative immersive environments, and traditional security attacks implemented in MR. Our findings indicate that users generally exhibit lower recognition rates for immersive attacks (e.g., spatial occlusion) compared to attacks inspired by traditional ones (e.g., click redirection). Our results demonstrate a clear gap in user awareness and responses when collaborating remotely in MR environments. Our findings emphasize the importance of training users to recognize potential threats and enhanced security measures to maintain trust in remote collaborative MR systems.
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11 · 6 citations
articleOpen accessSenior authorHigh-resolution Vision-Language Models (VLMs) are widely used in multimodal tasks to enhance accuracy by preserving detailed image information. However, these models often generate an excessive number of visual tokens due to the need to encode multiple partitions of a high-resolution image input. Processing such a large number of visual tokens poses significant computational challenges, particularly for resource-constrained commodity GPUs. To address this challenge, we propose High-Resolution Early Dropping (HiRED), a plug-and-play token-dropping method designed to operate within a fixed token budget. HiRED leverages the attention of CLS token in the vision transformer (ViT) to assess the visual content of the image partitions and allocate an optimal token budget for each partition accordingly. The most informative visual tokens from each partition within the allocated budget are then selected and passed to the subsequent Large Language Model (LLM). We showed that HiRED achieves superior accuracy and performance, compared to existing token-dropping methods. Empirically, HiRED-20% (i.e., a 20% token budget) on LLaVA-Next-7B achieves a 4.7x increase in token generation throughput, reduces response latency by 78%, and saves 14% of GPU memory for single inference on an NVIDIA TESLA P40 (24 GB). For larger batch sizes (e.g., 4), HiRED-20% prevents out-of-memory errors by cutting memory usage by 30%, while preserving throughput and latency benefits.
SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
2025-12-03 · 2 citations
articleOpen accessThe growing gap between the increasing complexity of large language models (LLMs) and the limited computational budgets of edge devices poses a key challenge for efficient on-device inference, despite gradual improvements in hardware capabilities. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new framework that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose SLED, a framework that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server verifies the tokens utilizing a more precise target model. To further increase the efficiency of verification, the edge server batches the diverse verification requests from devices. This approach supports heterogeneous devices and reduces server-side memory footprint by sharing a single upstream target model across devices. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: ×2.2 higher system throughput, ×2.8 higher system capacity, and better cost efficiency, all without sacrificing model accuracy.
ArXiv.org · 2025-08-03
preprintOpen accessDeploying deep neural networks (DNNs) on resource-constrained IoT devices remains a challenging problem, often requiring hardware modifications tailored to individual AI models. Existing accelerator-generation tools, such as AMD's FINN, do not adequately address extreme resource limitations faced by IoT endpoints operating in bare-metal environments without an operating system (OS). To overcome these constraints, we propose MARVEL-an automated, end-to-end framework that generates custom RISC-V ISA extensions tailored to specific DNN model classes, with a primary focus on convolutional neural networks (CNNs). The proposed method profiles high-level DNN representations in Python and generates an ISA-extended RISC-V core with associated compiler tools for efficient deployment. The flow leverages (1) Apache TVM for translating high-level Python-based DNN models into optimized C code, (2) Synopsys ASIP Designer for identifying compute-intensive kernels, modeling, and generating a custom RISC-V and (3) Xilinx Vivado for FPGA implementation. Beyond a model class specific RISC-V, our approach produces an optimized bare-metal C implementation, eliminating the need for an OS or extensive software dependencies. Unlike conventional deployment pipelines relying on TensorFlow/PyTorch runtimes, our solution enables seamless execution in highly resource-constrained environments. We evaluated the flow on popular DNN models such as LeNet-5*, MobileNetV1, ResNet50, VGG16, MobileNetV2 and DenseNet121 using the Synopsys trv32p3 RISC-V core as a baseline. Results show a 2x speedup in inference and upto 2x reduction in energy per inference at a 28.23% area overhead when implemented on an AMD Zynq UltraScale+ ZCU104 FPGA platform.
P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
2025-08-04
articleSenior authorSplit Learning (SL) is an emerging privacy-preserving machine learning technique that enables resource constrained edge devices to participate in model training by partitioning a model into client-side and server-side sub-models. While SL reduces computational overhead on edge devices, it encounters significant challenges in heterogeneous environments where devices vary in computing resources, communication capabilities, environmental conditions, and privacy requirements. Although recent studies have explored heterogeneous SL frameworks that optimize split points for devices with varying resource constraints, they often neglect personalized privacy requirements and local model customization under varying environmental conditions. To address these limitations, we propose P3SL, a Personalized Privacy-Preserving Split Learning framework designed for heterogeneous, resource-constrained edge device systems. The key contributions of this work are twofold. First, we design a personalized sequential split learning pipeline that allows each client to achieve customized privacy protection and maintain personalized local models tailored to their computational resources, environmental conditions, and privacy needs. Second, we adopt a bi-level optimization technique that empowers clients to determine their own optimal personalized split points without sharing private sensitive information (i.e., computational resources, environmental conditions, privacy requirements) with the server. This approach balances energy consumption and privacy leakage risks while maintaining high model accuracy. We implement and evaluate P3SL on a testbed consisting of 7 devices including 4 Jetson Nano P3450 devices, 2 Raspberry Pis, and 1 laptop, using diverse model architectures and datasets under varying environmental conditions. Experimental results demonstrate that P3SL significantly mitigates privacy leakage risks, reduces system energy consumption by up to 59.12%, and consistently retains high accuracy compared to the state-of-the-art heterogeneous SL system.
ArXiv.org · 2025-01-27
preprintOpen accessMixed Reality (MR) devices are being increasingly adopted across a wide range of real-world applications, ranging from education and healthcare to remote work and entertainment. However, the unique immersive features of MR devices, such as 3D spatial interactions and the encapsulation of virtual objects by invisible elements, introduce new vulnerabilities leading to interaction obstruction and misdirection. We implemented latency, click redirection, object occlusion, and spatial occlusion attacks within a remote collaborative MR platform using the Microsoft HoloLens 2 and evaluated user behavior and mitigations through a user study. We compared responses to MR-specific attacks, which exploit the unique characteristics of remote collaborative immersive environments, and traditional security attacks implemented in MR. Our findings indicate that users generally exhibit lower recognition rates for immersive attacks (e.g., spatial occlusion) compared to attacks inspired by traditional ones (e.g., click redirection). Our results demonstrate a clear gap in user awareness and responses when collaborating remotely in MR environments. Our findings emphasize the importance of training users to recognize potential threats and enhanced security measures to maintain trust in remote collaborative MR systems.
SfM-Free 3D Gaussian Splatting via Hierarchical Training
2025-06-10 · 5 citations
article1st authorCorrespondingStandard 3D Gaussian Splatting (3DGS) relies on known or pre-computed camera poses and a sparse point cloud, obtained from structure-from-motion (SfM) preprocessing, to initialize and grow 3D Gaussians. We propose a novel SfM-Free 3DGS (HT-3DGS) method for video input, eliminating the need for known camera poses and SfM preprocessing. Our approach introduces a hierarchical training strategy that trains and merges multiple 3D Gaussian representations – each optimized for specific scene regions – into a single, unified 3DGS model representing the entire scene. To compensate for large camera motions, we leverage video frame interpolation models. Additionally, we incorporate multi-source supervision to reduce overfitting and enhance representation. Experimental results reveal that our approach significantly surpasses state-of-the-art SfM-free novel view synthesis methods. On the Tanks and Temples dataset, we improve PSNR by an average of 2.25dB, with a maximum gain of 3.72dB in the best scene. On the CO3D-V2 dataset, we achieve an average PSNR boost of 1.74dB, with a top gain of 3.90dB. The code is available at https://github.com/jibo27/3DGS_Hierarchical_Training.
ArXiv.org · 2025-08-09
preprintOpen accessAs Extended Reality (XR) devices become increasingly prevalent in everyday settings, they raise significant privacy concerns for bystanders: individuals in the vicinity of an XR device during its use, whom the device sensors may accidentally capture. Current privacy indicators, such as small LEDs, often presume that bystanders are attentive enough to interpret the privacy signals. However, these cues can be easily overlooked when bystanders are distracted or have limited vision. We define such individuals as situationally impaired bystanders. This study explores XR privacy indicator designs that are effective for situationally impaired bystanders. A focus group with eight participants was conducted to design five novel privacy indicators. We evaluated these designs through a user study with seven additional participants. Our results show that visual-only indicators, typical in commercial XR devices, received low ratings for perceived usefulness in impairment scenarios. In contrast, multimodal indicators were preferred in privacy-sensitive scenarios with situationally impaired bystanders. Ultimately, our results highlight the need to move toward adaptable, multimodal, and situationally aware designs that effectively support bystander privacy in everyday XR environments.
2025-10-08 · 2 citations
articleOpen accessAs Extended Reality (XR) devices become increasingly prevalent in everyday settings, they raise significant privacy concerns for bystanders: individuals in the vicinity of an XR device during its use, whom the device sensors may accidentally capture. Current privacy indicators, such as small LEDs, often presume that bystanders are attentive enough to interpret the privacy signals. However, these cues can be easily overlooked when bystanders are distracted or have limited vision. We define such individuals as situationally impaired bystanders. This study explores XR privacy indicator designs that are effective for situationally impaired bystanders. A focus group with eight participants was conducted to design five novel privacy indicators. We evaluated these designs through a user study with seven additional participants. Our results show that visual-only indicators, typical in commercial XR devices, received low ratings for perceived usefulness in impairment scenarios. In contrast, multimodal indicators were preferred in privacy-sensitive scenarios with situationally impaired bystanders. Ultimately, our results highlight the need to move toward adaptable, multimodal, and situationally aware designs that effectively support bystander privacy in everyday XR environments.
Recent grants
NSF · $427k · 2017–2021
NSF · $301k · 2020–2023
CRII: CIF: Models, Theories and Algorithms for Timeliness Optimization in Information-update Systems
NSF · $189k · 2017–2021
Frequent coauthors
- 22 shared
Ness B. Shroff
- 20 shared
Bin Li
- 14 shared
Fengjiao Li
- 14 shared
Zhongdong Liu
- 13 shared
Yu Sang
Liaoning Technical University
- 12 shared
Jia Liu
Samsung (China)
- 10 shared
Liping Qian
Zhejiang University of Technology
- 10 shared
Gamal Sallam
Temple University
Education
- 2012
PhD, Electrical and Computer Engineering
The Ohio State University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Bo Ji
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup