
Ali Butt
· Assistant ProfessorVerifiedVirginia Tech · Computer Science
Active 2000–2026
About
Ali Butt is a Professor and Associate Department Head for Faculty Development in the Department of Computer Science at Virginia Tech. He is also the Director of the stack@cs Center for Computer Systems. His research interests include cloud and high-performance computing systems, systems support for machine and deep learning applications, file, I/O, and storage systems, distributed systems, and large-scale experimental computer systems. He holds a Ph.D. in electrical and computer engineering from Purdue University, obtained in 2006. His professional location includes Gilbert Place RM 4108 at Virginia Tech, and he is involved in various research and academic activities related to computer systems and infrastructure.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Operating system
- Machine Learning
- Distributed computing
- Data Mining
- Algorithm
- Artificial Intelligence
- Parallel computing
- Mathematics
- Geometry
- Computer graphics (images)
- Simulation
- Computer network
- Computer hardware
- Reliability engineering
- Engineering
- Database
- Statistics
- Computer architecture
Selected publications
Eliminate Branches by Melding IR Instructions (Artifact)
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-17
otherOpen accessThe tarball includes the LLVM implementation of the MERIT transformation, all evaluation benchmarks, and the scripts necessary to reproduce the results in this paper. The description pdf contains the instruction of performing evaluation.
Eliminate Branches by Melding IR Instructions (Artifact)
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-17
otherOpen accessThe tarball includes the LLVM implementation of the MERIT transformation, all evaluation benchmarks, and the scripts necessary to reproduce the results in this paper. The description pdf contains the instruction of performing evaluation.
IP-FL: Incentive-Driven Personalization in Federated Learning
2025-06-03
articleFederated Learning (FL) is an approach for privacypreserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. Existing incentive solutions for traditional Federated Learning (FL) focus on individual contributions to a single global objective, neglecting the nuances of clustered personalization with multiple cluster-level models and the non-monetary incentives such as personalized model appeal for clients. In this paper, we first propose to treat incentivization and personalization as interrelated challenges and solve them with an incentive mechanism that fosters personalized learning. Additionally, current methods depend on an aggregator for client clustering, which is limited by a lack of access to clients' confidential information due to privacy constraints, leading to inaccurate clustering. To overcome this, we propose direct client involvement, allowing clients to indicate their cluster membership preferences based on data distribution and incentive-driven feedback. Our approach enhances the personalized model appeal for self-aware clients with high-quality data leading to their active and consistent participation. Our evaluation demonstrates significant improvements in test accuracy (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$8-45 \%$</tex>), personalized model appeal (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$3-38 \%$</tex>), and participation rates (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{31 - 100 \%}$</tex>) over existing FL models, including those addressing data heterogeneity and personalization.
ArXiv.org · 2025-12-11
preprintOpen accessSenior authorModern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters and limited visibility into application characteristics pose major challenges for existing schedulers, which often rely on offline profiling or application-specific assumptions. We present RLTune, an application-agnostic reinforcement learning (RL)-based scheduling framework that dynamically prioritizes and allocates DL jobs on heterogeneous GPU clusters. RLTune integrates RL-driven prioritization with MILP-based job-to-node mapping to optimize system-wide objectives such as job completion time (JCT), queueing delay, and resource utilization. Trained on large-scale production traces from Microsoft Philly, Helios, and Alibaba, RLTune improves GPU utilization by up to 20%, reduces queueing delay by up to 81%, and shortens JCT by as much as 70 percent. Unlike prior approaches, RLTune generalizes across diverse workloads without requiring per-job profiling, making it practical for cloud providers to deploy at scale for more efficient, fair, and sustainable DL workload management.
User-based I/O Profiling for Leadership Scale HPC Workloads
2025-01-02 · 2 citations
articleOpen accessSenior authorI/O constitutes a significant portion of most of the application runtime. Spawning many such applications concurrently on an HPC system leads to severe I/O contention. Thus, understanding and subsequently reducing I/O contention induced by such multi-tenancy is critical for the efficient and reliable performance of the HPC system. In this study, we demonstrate that an application’s performance is influenced by the command line arguments passed to the job submission. We model an application’s I/O behavior based on two factors: past I/O behavior within a time window and userconfigured I/O settings via command-line arguments. We conclude that I/O patterns for well-known HPC applications like E3SM and LAMMP are predictable, with an average uncertainty below 0.25 (A probability of 80%) and near zero (A probability of 100%) within a day. However, I/O pattern variance increases as the study time window lengthens. Additionally, we show that for 38 users and at least 50 applications constituting approximately 93000 job submissions, there is a high correlation between a submitted command line and the past command lines made within 1 to 10 days submitted by the user. We claim the length of this time window is unique per user.
Memory Tiering in Python Virtual Machine
2025-10-09
articleOpen accessSenior authorModern Python applications consume massive amounts of memory in data centers. Emerging memory technologies such as CXL have emerged as a pivotal interconnect for memory expansion. Prior efforts in memory tiering that relied on OS page or hardware counters information incurred notable overhead and lacked awareness of fine-grained object access patterns. Moreover, these tiering configurations cannot be tailored to individual Python applications, limiting their applicability in QoS-sensitive environments. In this paper, we introduce Memory Tiering in Python VM (MTP), an extension module built atop the popular CPython interpreter to support memory tiering in Python applications. MTP leverages reference count changes from garbage collection to infer object temperatures and reduces unnecessary migration overhead through a software-defined page temperature table. To the best of our knowledge, MTP is the first framework to offer portability, easy deployment, and per-application tiering customization for Python workloads.
CIWARS: A Web Server for Antibiotic Resistance Surveillance Using Longitudinal Metagenomic Data
Journal of Molecular Biology · 2025-04-21 · 3 citations
articleMulti-Agent Code-Orchestrated Generation for Reliable Infrastructure-as-Code
ArXiv.org · 2025-10-04
preprintOpen accessSenior authorThe increasing complexity of cloud-native infrastructure has made Infrastructure-as-Code (IaC) essential for reproducible and scalable deployments. While large language models (LLMs) have shown promise in generating IaC snippets from natural language prompts, their monolithic, single-pass generation approach often results in syntactic errors, policy violations, and unscalable designs. In this paper, we propose MACOG (Multi-Agent Code-Orchestrated Generation), a novel multi-agent LLM-based architecture for IaC generation that decomposes the task into modular subtasks handled by specialized agents: Architect, Provider Harmonizer, Engineer, Reviewer, Security Prover, Cost and Capacity Planner, DevOps, and Memory Curator. The agents interact via a shared-blackboard, finite-state orchestrator layer, and collectively produce Terraform configurations that are not only syntactically valid but also policy-compliant and semantically coherent. To ensure infrastructure correctness and governance, we incorporate Terraform Plan for execution validation and Open Policy Agent (OPA) for customizable policy enforcement. We evaluate MACOG using the IaC-Eval benchmark, where MACOG is the top enhancement across models, e.g., GPT-5 improves from 54.90 (RAG) to 74.02 and Gemini-2.5 Pro from 43.56 to 60.13, with concurrent gains on BLEU, CodeBERTScore, and an LLM-judge metric. Ablations show constrained decoding and deploy feedback are critical: removing them drops IaC-Eval to 64.89 and 56.93, respectively.
TreeCNN and NILMTK Unite: Illuminating Energy Efficiency in Real-World Scenarios
2024-12-15
articleSenior authorEfficiently managing electricity supply and demand, especially during peak times to minimize waste, remains a key challenge for the electric grid. An effective solution involves incentivizing users to shift their shiftable loads, such as dishwashers and washing machines, to off-peak periods. Non-Intrusive Load Monitoring (NILM) provides a cost-effective and pragmatic approach for detailed appliance energy consumption insights. Among Deep Learning models, TreeCNN has shown superior performance compared to RNN and traditional CNN models in energy disaggregation. However, its evaluation has been limited to the Dataport dataset. To fully assess TreeCNN’s capabilities, comprehensive testing with diverse datasets like REDD, UK-DALE, DRED and others is essential. Additionally, integrating TreeCNN into NILMTK, a dataset standardization tool, enables thorough comparisons with 16 formatted datasets and other disaggregation algorithms. In this work, we integrated TreeCNN into NILMTK toolkit and benchmarked, providing valuable insights into its effectiveness and real-world usability.
Ensuring Fair LLM Serving Amid Diverse Applications
arXiv (Cornell University) · 2024-11-24
preprintOpen accessIn a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and ensure fairness. Our experimental results on real-world traces demonstrate FairServe's superior performance compared to the state-of-the-art method in ensuring fairness. We are actively working on deploying our system in production, expecting to benefit millions of customers world-wide.
Recent grants
NSF · $442k · 2010–2014
NSF · $969k · 2019–2024
CSR: Small: Collaborative Research: Scalable Fine-Grained Cloud Monitoring for Empowering IoT
NSF · $258k · 2016–2020
CAREER: A Scalable Hierarchical Framework for High-Performance Data Storage
NSF · $476k · 2008–2014
DC: Small: Collaborative Research: Exploring Energy-Reliability Trade-offs in Data Storage Systems
NSF · $216k · 2010–2013
Frequent coauthors
- 37 shared
Ali Anwar
- 32 shared
Yue Cheng
University of Virginia
- 26 shared
M. Mustafa Rafique
Rochester Institute of Technology
- 21 shared
Sudharshan S. Vazhkudai
Micron (United States)
- 19 shared
Kirk W. Cameron
Virginia Tech
- 17 shared
Arnab K. Paul
Birla Institute of Technology and Science, Pilani
- 16 shared
Guanying Wang
- 14 shared
Thomas Lux
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ali Butt
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup