Ali Butt

· Assistant ProfessorVerified

Virginia Tech · Computer Science

Active 2000–2026

h-index28

Citations3.5k

Papers19636 last 5y

Funding$3.3M

Faculty page Lab page

See your match with Ali Butt — sign in to PhdFit.Sign in

About

Ali Butt is a Professor and Associate Department Head for Faculty Development in the Department of Computer Science at Virginia Tech. He is also the Director of the stack@cs Center for Computer Systems. His research interests include cloud and high-performance computing systems, systems support for machine and deep learning applications, file, I/O, and storage systems, distributed systems, and large-scale experimental computer systems. He holds a Ph.D. in electrical and computer engineering from Purdue University, obtained in 2006. His professional location includes Gilbert Place RM 4108 at Virginia Tech, and he is involved in various research and academic activities related to computer systems and infrastructure.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Operating system
Machine Learning
Distributed computing
Data Mining
Algorithm
Artificial Intelligence
Parallel computing
Mathematics
Geometry
Computer graphics (images)
Simulation
Computer network
Computer hardware
Reliability engineering
Engineering
Database
Statistics
Computer architecture

Selected publications

Eliminate Branches by Melding IR Instructions (Artifact)
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-17
otherOpen access
The tarball includes the LLVM implementation of the MERIT transformation, all evaluation benchmarks, and the scripts necessary to reproduce the results in this paper. The description pdf contains the instruction of performing evaluation.
Publisher DOI
Eliminate Branches by Melding IR Instructions (Artifact)
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-17
otherOpen access
The tarball includes the LLVM implementation of the MERIT transformation, all evaluation benchmarks, and the scripts necessary to reproduce the results in this paper. The description pdf contains the instruction of performing evaluation.
Publisher OA PDF DOI
IP-FL: Incentive-Driven Personalization in Federated Learning
2025-06-03
article
Federated Learning (FL) is an approach for privacypreserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. Existing incentive solutions for traditional Federated Learning (FL) focus on individual contributions to a single global objective, neglecting the nuances of clustered personalization with multiple cluster-level models and the non-monetary incentives such as personalized model appeal for clients. In this paper, we first propose to treat incentivization and personalization as interrelated challenges and solve them with an incentive mechanism that fosters personalized learning. Additionally, current methods depend on an aggregator for client clustering, which is limited by a lack of access to clients' confidential information due to privacy constraints, leading to inaccurate clustering. To overcome this, we propose direct client involvement, allowing clients to indicate their cluster membership preferences based on data distribution and incentive-driven feedback. Our approach enhances the personalized model appeal for self-aware clients with high-quality data leading to their active and consistent participation. Our evaluation demonstrates significant improvements in test accuracy (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$8-45 \%$</tex>), personalized model appeal (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$3-38 \%$</tex>), and participation rates (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{31 - 100 \%}$</tex>) over existing FL models, including those addressing data heterogeneity and personalization.
Publisher DOI
Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters
ArXiv.org · 2025-12-11
preprintOpen accessSenior author
Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters and limited visibility into application characteristics pose major challenges for existing schedulers, which often rely on offline profiling or application-specific assumptions. We present RLTune, an application-agnostic reinforcement learning (RL)-based scheduling framework that dynamically prioritizes and allocates DL jobs on heterogeneous GPU clusters. RLTune integrates RL-driven prioritization with MILP-based job-to-node mapping to optimize system-wide objectives such as job completion time (JCT), queueing delay, and resource utilization. Trained on large-scale production traces from Microsoft Philly, Helios, and Alibaba, RLTune improves GPU utilization by up to 20%, reduces queueing delay by up to 81%, and shortens JCT by as much as 70 percent. Unlike prior approaches, RLTune generalizes across diverse workloads without requiring per-job profiling, making it practical for cloud providers to deploy at scale for more efficient, fair, and sustainable DL workload management.
Publisher OA PDF DOI
User-based I/O Profiling for Leadership Scale HPC Workloads
2025-01-02 · 2 citations
articleOpen accessSenior author
I/O constitutes a significant portion of most of the application runtime. Spawning many such applications concurrently on an HPC system leads to severe I/O contention. Thus, understanding and subsequently reducing I/O contention induced by such multi-tenancy is critical for the efficient and reliable performance of the HPC system. In this study, we demonstrate that an application’s performance is influenced by the command line arguments passed to the job submission. We model an application’s I/O behavior based on two factors: past I/O behavior within a time window and userconfigured I/O settings via command-line arguments. We conclude that I/O patterns for well-known HPC applications like E3SM and LAMMP are predictable, with an average uncertainty below 0.25 (A probability of 80%) and near zero (A probability of 100%) within a day. However, I/O pattern variance increases as the study time window lengthens. Additionally, we show that for 38 users and at least 50 applications constituting approximately 93000 job submissions, there is a high correlation between a submitted command line and the past command lines made within 1 to 10 days submitted by the user. We claim the length of this time window is unique per user.
Publisher OA PDF DOI
Memory Tiering in Python Virtual Machine
2025-10-09
articleOpen accessSenior author
Modern Python applications consume massive amounts of memory in data centers. Emerging memory technologies such as CXL have emerged as a pivotal interconnect for memory expansion. Prior efforts in memory tiering that relied on OS page or hardware counters information incurred notable overhead and lacked awareness of fine-grained object access patterns. Moreover, these tiering configurations cannot be tailored to individual Python applications, limiting their applicability in QoS-sensitive environments. In this paper, we introduce Memory Tiering in Python VM (MTP), an extension module built atop the popular CPython interpreter to support memory tiering in Python applications. MTP leverages reference count changes from garbage collection to infer object temperatures and reduces unnecessary migration overhead through a software-defined page temperature table. To the best of our knowledge, MTP is the first framework to offer portability, easy deployment, and per-application tiering customization for Python workloads.
Publisher OA PDF DOI
CIWARS: A Web Server for Antibiotic Resistance Surveillance Using Longitudinal Metagenomic Data
Journal of Molecular Biology · 2025-04-21 · 3 citations
article
Publisher DOI
Multi-Agent Code-Orchestrated Generation for Reliable Infrastructure-as-Code
ArXiv.org · 2025-10-04
preprintOpen accessSenior author
The increasing complexity of cloud-native infrastructure has made Infrastructure-as-Code (IaC) essential for reproducible and scalable deployments. While large language models (LLMs) have shown promise in generating IaC snippets from natural language prompts, their monolithic, single-pass generation approach often results in syntactic errors, policy violations, and unscalable designs. In this paper, we propose MACOG (Multi-Agent Code-Orchestrated Generation), a novel multi-agent LLM-based architecture for IaC generation that decomposes the task into modular subtasks handled by specialized agents: Architect, Provider Harmonizer, Engineer, Reviewer, Security Prover, Cost and Capacity Planner, DevOps, and Memory Curator. The agents interact via a shared-blackboard, finite-state orchestrator layer, and collectively produce Terraform configurations that are not only syntactically valid but also policy-compliant and semantically coherent. To ensure infrastructure correctness and governance, we incorporate Terraform Plan for execution validation and Open Policy Agent (OPA) for customizable policy enforcement. We evaluate MACOG using the IaC-Eval benchmark, where MACOG is the top enhancement across models, e.g., GPT-5 improves from 54.90 (RAG) to 74.02 and Gemini-2.5 Pro from 43.56 to 60.13, with concurrent gains on BLEU, CodeBERTScore, and an LLM-judge metric. Ablations show constrained decoding and deploy feedback are critical: removing them drops IaC-Eval to 64.89 and 56.93, respectively.
Publisher OA PDF DOI
TreeCNN and NILMTK Unite: Illuminating Energy Efficiency in Real-World Scenarios
2024-12-15
articleSenior author
Efficiently managing electricity supply and demand, especially during peak times to minimize waste, remains a key challenge for the electric grid. An effective solution involves incentivizing users to shift their shiftable loads, such as dishwashers and washing machines, to off-peak periods. Non-Intrusive Load Monitoring (NILM) provides a cost-effective and pragmatic approach for detailed appliance energy consumption insights. Among Deep Learning models, TreeCNN has shown superior performance compared to RNN and traditional CNN models in energy disaggregation. However, its evaluation has been limited to the Dataport dataset. To fully assess TreeCNN’s capabilities, comprehensive testing with diverse datasets like REDD, UK-DALE, DRED and others is essential. Additionally, integrating TreeCNN into NILMTK, a dataset standardization tool, enables thorough comparisons with 16 formatted datasets and other disaggregation algorithms. In this work, we integrated TreeCNN into NILMTK toolkit and benchmarked, providing valuable insights into its effectiveness and real-world usability.
Publisher DOI
Ensuring Fair LLM Serving Amid Diverse Applications
arXiv (Cornell University) · 2024-11-24
preprintOpen access
In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and ensure fairness. Our experimental results on real-world traces demonstrate FairServe's superior performance compared to the state-of-the-art method in ensuring fairness. We are actively working on deploying our system in production, expecting to benefit millions of customers world-wide.
Publisher OA PDF DOI

Recent grants

CSR: Small: Towards Realizing Cloud HPC: An Adaptive Programming Model for Accelerator-based Clusters
NSF · $442k · 2010–2014
SPX: Collaborative Research: Cross-stack Memory Optimizations for Boosting I/O Performance of Deep Learning HPC Applications
NSF · $969k · 2019–2024
CSR: Small: Collaborative Research: Scalable Fine-Grained Cloud Monitoring for Empowering IoT
NSF · $258k · 2016–2020
CAREER: A Scalable Hierarchical Framework for High-Performance Data Storage
NSF · $476k · 2008–2014
DC: Small: Collaborative Research: Exploring Energy-Reliability Trade-offs in Data Storage Systems
NSF · $216k · 2010–2013

Frequent coauthors

Ali Anwar
37 shared
Yue Cheng
University of Virginia
32 shared
M. Mustafa Rafique
Rochester Institute of Technology
26 shared
Sudharshan S. Vazhkudai
Micron (United States)
21 shared
Kirk W. Cameron
Virginia Tech
19 shared
Arnab K. Paul
Birla Institute of Technology and Science, Pilani
17 shared
Guanying Wang
16 shared
Thomas Lux
14 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ali Butt

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you