
Arun Balasubramanian
· Research Assistant ProfessorVerifiedStony Brook University · Computer Science
Active 1981–2026
About
Aruna Balasubramanian is a faculty member at Stony Brook University whose research lies at the intersection of networking and systems, with a particular focus on smartphones and wearable devices. Her work explores various aspects of mobile computing, including performance optimization, system design for accessibility, and mobile user experience. She leads a research group that investigates these topics, contributing to advancements in mobile networking protocols, video streaming, and accessibility technologies for visually impaired users. Aruna has been recognized with several awards, including the 2021 SIGMOBILE Rockstar award and the Google Research Scholar Award in 2021, which supports her research on improving the accessibility of mobile applications. She has also taken on leadership roles in the academic community, chairing major conferences such as MobiSys 2022 and COMSNETS 2021, and serving on steering committees for programs aimed at fostering inclusive computing. Her academic career milestones include receiving tenure and promotion to Associate Professor in January 2021, reflecting her significant contributions to the field of mobile systems and networking.
Research topics
- Computer Science
- Artificial Intelligence
- Natural Language Processing
- Electrical engineering
- Computer network
- Real-time computing
- Computer vision
- Cognitive psychology
- Psychology
- Linguistics
- Programming language
- Engineering
Selected publications
Dual-Foundation Models for Unsupervised Domain Adaptation
ArXiv.org · 2026-05-05
articleOpen accessSemantic segmentation provides pixel-level scene understanding essential for autonomous driving and fine-grained perception tasks. However, training segmentation models requires costly, labor-intensive annotations on real-world datasets. Unsupervised Domain Adaptation (UDA) addresses this by training models on labeled synthetic data and adapting them to unlabeled real images. While conceptually simple, adaptation is challenging due to the domain gap, i.e., differences in visual appearance and scene structure between synthetic and real data. Prior approaches bridge this gap through pixel-level mixing or feature-level contrastive learning. Yet, these techniques suffer from two major limitations: (1) reliance on high-confidence pseudo-labels restricts learning to a subset of the target domain, and (2) prototype-based contrastive methods initialize class prototypes from source-trained models, yielding biased and unstable anchors during adaptation. To address these issues, we propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning. Our method achieves consistent improvements of +1.3% and +1.4% mIoU over strong UDA baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes, respectively.
Dual-Foundation Models for Unsupervised Domain Adaptation
arXiv (Cornell University) · 2026-05-05
preprintOpen accessSemantic segmentation provides pixel-level scene understanding essential for autonomous driving and fine-grained perception tasks. However, training segmentation models requires costly, labor-intensive annotations on real-world datasets. Unsupervised Domain Adaptation (UDA) addresses this by training models on labeled synthetic data and adapting them to unlabeled real images. While conceptually simple, adaptation is challenging due to the domain gap, i.e., differences in visual appearance and scene structure between synthetic and real data. Prior approaches bridge this gap through pixel-level mixing or feature-level contrastive learning. Yet, these techniques suffer from two major limitations: (1) reliance on high-confidence pseudo-labels restricts learning to a subset of the target domain, and (2) prototype-based contrastive methods initialize class prototypes from source-trained models, yielding biased and unstable anchors during adaptation. To address these issues, we propose a dual-foundation UDA framework that leverages two complementary foundation models. First, we employ the Segment Anything Model (SAM) with superpixel-guided prompting to enable learning from a broader range of target pixels beyond high-confidence predictions. Second, we incorporate DINOv3 to construct stable, domain-invariant class prototypes through its robust representation learning. Our method achieves consistent improvements of +1.3% and +1.4% mIoU over strong UDA baselines on GTA-to-Cityscapes and SYNTHIA-to-Cityscapes, respectively.
ArXiv.org · 2025-09-02
preprintOpen accessMulti-agent systems with smaller language models (SLMs) present a viable alternative to single agent systems powered by large language models (LLMs) for addressing complex problems. In this work, we study how these alternatives compare in terms of both effectiveness and efficiency. To study this trade-off, we instantiate single and multi-agent systems for the complex problems in the AppWorld environment using different sized language models. We find that difficulties with long-trajectory learning in smaller language models (SLMs) limit their performance. Even when trained for specialized roles, SLMs fail to learn all subtasks effectively. To address this issue, we introduce a simple progressive sub-task training strategy, which introduces new sub-tasks progressively in each training epoch. We find that this novel strategy, analogous to instance level curriculum learning, consistently improves the effectiveness of multi-agents at all configurations. Our Pareto analysis shows that fine-tuned multi-agent systems yield better effectiveness-efficiency trade-offs. Additional ablations and analyses shows the importance of our progressive training strategy and its ability to reduce subtask error rates.
Investigating WebRTC BBR as an alternative to GCC for live video streaming
2025-01-06
articleGoogle Congestion Control (GCC) is the default congestion control algorithm for WebRTC, a popular web application used for live video streaming. BBR, also developed at Google, is commonly used for streaming pre-recorded video on services like YouTube. However, BBR has not been widely deployed for real-time applications like live video streaming. It was implemented for WebRTC in 2018, but it was later deprecated due to poor performance. While GCC performs well under most network conditions, it can be starved by a loss-based TCP flow using the same bottleneck link. In this work, we investigate the possibility of using BBR as an alternative to GCC for WebRTC congestion control. We test it under a variety of network conditions and find that it performs better than GCC when competing with TCP, and it achieves bitrates comparable to GCC’s in isolation, except when bandwidth is restricted and the bottleneck buffer is deep. We find that this is because of bandwidth overestimation, a problem which also exists in TCP BBR. While modifying WebRTC BBR’s bandwidth estimation fails to improve performance in our experiments, we do find that disabling its recovery state, a unique loss response, improves WebRTC BBR’s performance in under-provisioned networks.
Degrees of Decentralized Freedom: Comparing Modern Decentralized Storage Platforms
2025-06-10
articleSenior authorDecentralized storage platforms distribute control across individual peers, thus reducing reliance on a single entity and mitigating common vulnerabilities of centralized storage systems. In this paper, we compare the architecture and operation of four popular decentralized storage platforms: IPFS, Filecoin, Swarm, and Storj. Our study reveals significant implementation differences in four key aspects: data routing, data persistence, incentivization mechanisms, and resource requirements. These architectural decisions directly influence network characteristics, performance metrics, and economic sustainability. We collect comprehensive snapshots of the entire network to analyze network properties including peer uptime, geographical distribution, and network availability. Our analysis shows that while IPFS maintains the largest user base, it exhibits the lowest peer uptime due to lack of incentivization, with 50% of peers online for less than 4 days. In contrast, incentivized platforms exhibited median peer uptimes around 80–96% of the study period. We found considerable performance variations that directly correlate with implementation choices. Storj, with its centralized data routing architecture, achieves performance nearly on par with centralized solutions like Google Drive. In contrast, Swarm showed the slowest performance metrics with its full commitment to decentralization. Finally, our analysis reveals that cryptocurrency price fluctuations significantly influence participation and cost, suggesting potential sustainability challenges in these decentralized storage platforms.
Artspeak: An Interactive AR Application for Lifelike Speaking with Art Portraits
2025-10-08
articleMuseum visits often lack personalized and interactive experiences, limiting visitor engagement with art and historical artifacts. To address this, we present ArtSpeak, a standalone augmented reality (AR) application that transforms traditional art viewing into an interactive storytelling experience. When users point their mobile cameras at an artwork, the system responds to their questions with lifelike, talking-head video narratives generated from historical portraits. However, generating such talking-head videos at runtime is computationally expensive, often requiring over a minute per response. To address this challenge, ArtSpeak introduces two major contributions. First, it employs a collection of frequently asked questions (FAQ) to generate a set of lifelike video responses for various art portraits. Second, it introduces a novel retrieval-based approach that uses GPT-based embeddings and cosine similarity to select the most relevant response. As a result, the system dynamically presents the video reply that best aligns with the user's inquiry, reducing computational overhead and ensuring a real-time, low-latency experience. More precisely, ArtSpeak achieves over 30 x lower latency and reduces energy consumption by approximately 81 % compared to the real-time video generation method. User studies further validate the system's effectiveness, with 85 % of participants rating the retrieved responses as relevant to their queries and 90 % reporting smooth video playback. These results highlight the efficiency and user satisfaction enabled by our retrieval-based approach.
GestureVoice: Enabling Multimodal Text Editing for Blind Users Using Gestures and Voice
2025-10-22 · 2 citations
articleSenior authorFine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P
ArXiv.org · 2025-12-14
preprintOpen accessWith the widespread adoption of Large Language Models (LLMs), energy costs of running LLMs is quickly becoming a critical concern. However, precisely measuring the energy consumption of LLMs is often infeasible because hardware-based power monitors are not always accessible and software-based energy measurement tools are not accurate. While various prediction techniques have been developed to estimate LLM energy consumption, these approaches are limited to single-GPU environments and thus are not applicable to modern LLM inference which is typically parallelized across multiple GPUs. In this work, we remedy this gap and introduce PIE-P, a fine-grained energy prediction framework for multi-GPU inference, including tensor, pipeline, and data parallelism. Predicting the energy under parallelized inference is complicated by the non-determinism in inter-GPU communication, additional communication overheads, and difficulties in isolating energy during the communication/synchronization phase. We develop a scalable prediction framework that addresses these issues via precise sampling, fine-grained modeling of inter-GPU communication, and careful accounting of parallelization overhead. Our evaluation results show that PIE-P yields accurate and fine-grained energy predictions across parallelism strategies, significantly outperforming baselines.
Hand Gesture Recognition for Blind Users by Tracking 3D Gesture Trajectory
2024-05-11 · 5 citations
articleOpen accessSenior authorHand gestures provide an alternate interaction modality for blind users and can be supported using commodity smartwatches without requiring specialized sensors. The enabling technology is an accurate gesture recognition algorithm, but almost all algorithms are designed for sighted users. Our study shows that blind user gestures are considerably diferent from sighted users, rendering current recognition algorithms unsuitable. Blind user gestures have high inter-user variance, making learning gesture patterns difcult without large-scale training data. Instead, we design a gesture recognition algorithm that works on a 3D representation of the gesture trajectory, capturing motion in free space. Our insight is to extract a micro-movement in the gesture that is user-invariant and use this micro-movement for gesture classifcation. To this end, we develop an ensemble classifer that combines image classifcation with geometric properties of the gesture. Our evaluation demonstrates a 92% classifcation accuracy, surpassing the next best state-of-the-art which has an accuracy of 82%.
Secrets are Forever: Characterizing Sensitive File Leaks on IPFS
2024-06-03 · 2 citations
articleSenior authorThe InterPlanetary File System(IPFS) is an emerging peer-to-peer hypermedia protocol designed to enhance the speed, security, and openness of the web. Utilizing content-based addressing, IPFS establishes a decentralized, distributed, and trustless network for data storage and delivery. Despite its growing popularity, the inherent openness of IPFS raises concerns about accidental sharing of sensitive files, posing potential threats to user privacy and security. In this paper, we conduct a measurement study to investigate the extent of sensitive file sharing on the IPFS network. Using IPFS-search, a widely-used search engine indexing IPFS content, we identified over 2,000 files containing sensitive information such as API keys and private SSH keys. However, as IPFS-search operates on a centralized infrastructure, access restrictions may limit opportunistic attacks. To demonstrate the feasibility of identifying sensitive content, we deployed two IPFS nodes, recording file announcements from nearby peers, and identified over 700 sensitive files. Furthermore, we deployed honeypot IPFS nodes to gauge potential exploitation of these sensitive files by malicious actors over a six-month period. Our findings indicate that while sensitive files are indeed being shared on the IPFS network, there is currently no evidence of exploitation by attackers. However, with the increasing popularity of IPFS, the risk of such attacks is likely to rise. Our study underscores the importance of acknowledging the risks associated with sharing files on the IPFS network. As IPFS continues to gain traction, proactive measures must be taken to address vulnerabilities and safeguard sensitive data from potential exploitation.
Recent grants
NeTS: Small: Mobile Power Management as a Network Primitive
NSF · $499k · 2012–2015
NeTS: Small: Mobile Power Management as a Network Primitive
NSF · $70k · 2015–2016
CSR: Small: Collaborative Research: Easily Adapting Apps to Diverse Wearable Form Factors
NSF · $284k · 2017–2020
CRII: NeTS: Making Sense of Mobile Web Page Performance
NSF · $174k · 2016–2019
NSF · $500k · 2019–2024
Frequent coauthors
- 16 shared
Qingqing Cao
Nanjing Institute of Industry Technology
- 15 shared
Niranjan Balasubramanian
- 11 shared
Anshul Gandhi
Stony Brook University
- 11 shared
Arun Venkataramani
University of Massachusetts Amherst
- 10 shared
Javad Nejati
Stony Brook University
- 9 shared
Brian Neil Levine
- 8 shared
Yi Cao
Educational Testing Service
- 7 shared
Conor Kelton
Stony Brook University
Labs
Networking and systems research with a focus on smartphones and wearable devices
Awards & honors
- Applied Networking Research Prize (2015)
- Sigcomm Dissertation Award Runner Up (2011)
- UMass Outstanding Dissertation Award (2011)
- Computing Innovation Fellowship (2010-2012)
- Microsoft Research Fellowship (2008-2010)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Arun Balasubramanian
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup