
Ayse Coskun
· Professor – Electrical and Computer Engineering Affiliated Faculty – Computer ScienceVerifiedBoston University · Computer Science
Active 2006–2026
About
Dr. Ayse Coskun is a Professor of Electrical and Computer Engineering in the College of Engineering at Boston University. She is also an affiliated faculty member in the Department of Computer Science and serves as the Director of the Center for Information and Systems Engineering. Her research interests include energy-efficient computing, cloud computing, high performance computing, computer architecture, and embedded systems. Dr. Coskun holds additional affiliations with the Hariri Institute and the Division of Systems Engineering. She has received numerous honors and awards, including the IBM Faculty Award in 2020, the NSF CAREER Award from 2012 to 2017, and the Ernest S. Kuh Early Career Award from the IEEE Council on Electronic Design Automation in 2017. Her editorial roles include Deputy Editor-in-Chief of the IEEE Transactions on Computer Aided Design since 2022, and associate editor positions for several prominent journals such as ACM Transactions on Architecture and Code Optimization, IEEE Transactions on Computers, and Elsevier Microelectronics Journal. Dr. Coskun earned her PhD from the University of California, San Diego, and is actively involved in advancing research in her fields of expertise.
Research topics
- Computer Science
- Materials science
- Machine Learning
- Engineering
- Data Mining
- Electrical engineering
- Embedded system
- Programming language
- Operating system
- Computer network
- Distributed computing
- Electronic engineering
- Telecommunications
- Computer architecture
- Optoelectronics
- Nanotechnology
Selected publications
IEEE Energy Sustainability Magazine · 2026-01-28
articleSenior authorAs cloud-scale artificial intelligence (AI) and data processing workloads continue to surge, data center operators are increasingly exploring workload migration strategies to optimize energy costs and reduce environmental impact. While prior work has addressed cost- or carbon-aware migration independently, fewer studies have incorporated power grid strain as an operational constraint alongside these objectives. This article introduces a multiobjective optimization framework for cross-regional workload migration that minimizes electricity cost and carbon emissions, while enforcing strict constraints on grid strain and workload performance. Our approach uses real-world data from U.S. independent system operators (ISOs) to evaluate the feasibility and effectiveness of migrating workloads across regions under these combined considerations. We model migration overheads, including penalties from latency and runtime extension, and show how they influence the net benefit of shifting workloads. Our experiments show that up to 13% carbon and up to 15% cost reductions can be achieved while ensuring that no destination region exceeds safe grid operating thresholds. Furthermore, we highlight temporal and seasonal patterns that impact migration opportunities. This work provides new insights for sustainable workload migration strategies in geodistributed infrastructures and introduces a practical framework for evaluating carbon- and cost-aware migration decisions under explicit grid strain constraints.
Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis
ArXiv.org · 2026-03-25
articleOpen accessSenior authorAs the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems
arXiv (Cornell University) · 2026-04-02
preprintOpen accessEfficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory utilization, as well as the power consumption of the Vienna ab initio Simulation Package (VASP), using the Slurm workload manager historical logs and GPU performance metrics collected by NVIDIA's Data Center GPU Manager (DCGM). VASP is a widely used materials science application on Perlmutter at NERSC, an HPE Cray EX system based on NVIDIA A100 GPUs. Using our insights from the resource utilization analysis of VASP applications, we propose a resource prediction framework to predict the average GPU power, maximum GPU utilization, and maximum GPU memory utilization values of heterogeneous HPC system applications to enable more efficient scheduling decisions and power-aware system operation. Our prediction framework consists of two stages: 1) using only the Slurm accounting logs as training data and 2) augmenting the training data with historical GPU profiling metrics collected with DCGM. The maximum GPU utilization predictions using only the Slurm submission features achieve up to 97% accuracy. Furthermore, features engineered from GPU-compute and memory activity metrics exhibit good correlations with average power utilization, and our runtime power usage prediction experiments result in up to 92% prediction accuracy. These findings demonstrate the effectiveness of DCGM metrics in capturing application characteristics and highlight their potential for developing predictive models to support dynamic power management in HPC systems.
Job Grouping Based Intelligent Resource Prediction Framework
Lecture notes in computer science · 2026-01-01
book-chapterSenior authorLessons Learned from Anomaly Detection in Chameleon Cloud
2025-09-23
articleSenior authorCloud computing has become integral to modern technology infrastructure, supporting a wide range of services from e-commerce to AI applications. Chameleon is a large-scale, configurable testbed designed to enable edge-to-cloud research through full bare-metal provisioning, virtualization, and diverse hardware resources, which is built on a leading open source cloud platform OpenStack. However, monitoring Chameleon’s heterogeneous infrastructure is challenging, particularly across Open-Stack services and hardware components. Traditional threshold-based alerting methods struggle to keep up with the scale and complexity of such environments. In this work, we present an anomaly detection framework for OpenStack services in the Chameleon Cloud. We curate and publish the first dataset of resource usage metrics collected from OpenStack control plane services. We evaluate four state-of-the-art unsupervised multivariate time series models, namely TranAD, Prodigy, USAD, and OmniAnomaly, on this dataset and share key insights from deploying them. Our findings indicate that for our use case, while all models achieve high F1 scores, training with three days of healthy data effectively balances training cost and detection accuracy.
Analyzing GPU Utilization in HPC Workloads: Insights from Large-Scale Systems
2025-07-18 · 3 citations
articleUniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data
ArXiv.org · 2025-12-18
articleOpen accessSenior authorMachine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and adoption, especially in high-stakes domains such as healthcare. To address this challenge, we introduce UniCoMTE, a model-agnostic framework for generating counterfactual explanations for multivariate time series classifiers. The framework identifies temporal features that most heavily influence a model's prediction by modifying the input sample and assessing its impact on the model's prediction. UniCoMTE is compatible with a wide range of model architectures and operates directly on raw time series inputs. In this study, we evaluate UniCoMTE's explanations on a time series ECG classifier. We quantify explanation quality by comparing our explanations' comprehensibility to comprehensibility of established techniques (LIME and SHAP) and assessing their generalizability to similar samples. Furthermore, clinical utility is assessed through a questionnaire completed by medical experts who review counterfactual explanations presented alongside original ECG samples. Results show that our approach produces concise, stable, and human-aligned explanations that outperform existing methods in both clarity and applicability. By linking model predictions to meaningful signal patterns, the framework advances the interpretability of deep learning models for real-world time series applications.
iScience · 2025-10-06 · 8 citations
reviewOpen accessAs artificial intelligence (AI) applications demand substantial computational power, their energy consumption and the attendant carbon footprint of data centers are accelerating at an alarming rate. Due to increasing demand, data centers could consume 9% of global electricity demand by 2030. However, the path toward more sustainable AI and carbon-neutral data centers is hindered by the lack of transparency in data sharing. Without access to operational data from data centers, researchers face limitations in developing effective solutions to minimize carbon emissions. Transparency is urgently needed to foster innovation, advance sustainability goals, and create practical strategies for reducing the environmental impact of AI and data centers. Transparency will help us to innovate around sustainability problems that are expected to become even harder to solve with the booming of the AI industry. Taking the necessary actions discussed in this article is essential to ensure a more sustainable future of the data center industry.
ArXiv.org · 2025-07-01
preprintOpen accessArtificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emerald Conductor--that transforms AI data centers into flexible grid resources that can efficiently and immediately harness existing power systems without massive infrastructure buildout. Conducted at a 256-GPU cluster running representative AI workloads within a commercial, hyperscale cloud data center in Phoenix, Arizona, the trial achieved a 25% reduction in cluster power usage for three hours during peak grid events while maintaining AI quality of service (QoS) guarantees. By orchestrating AI workloads based on real-time grid signals without hardware modifications or energy storage, this platform reimagines data centers as grid-interactive assets that enhance grid reliability, advance affordability, and accelerate AI's development.
Fast Machine Learning Based Prediction for Temperature Simulation Using Compact Models
2025-03-31 · 3 citations
articleSenior authorAs transistor densities increase, managing thermal challenges in 3D IC designs becomes more complex. Traditional methods like finite element methods and compact thermal models (CTMs) are computationally expensive, while existing machine learning (ML) models require large datasets and a long training time. To address these challenges with the ML models, we introduce a novel ML framework that integrates with CTMs to accelerate steady-state thermal simulations without needing large datasets. Our approach achieves up to 70 × speedup over state-of-the-art simulators, enabling real-time, high-resolution thermal simulations for 2D and 3D IC designs.<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup><sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>This research was partially funded by the NSF CCF 2131127 grant
Recent grants
NSF · $450k · 2012–2017
SHF: Small: Reclaiming Dark Silicon via 2.5D Integrated Systems with Silicon Photonic Networks
NSF · $450k · 2017–2021
SHF: Small: Collaborative Research: Managing Thermal Integrity in Monolithic 3D Integrated Systems
NSF · $250k · 2019–2022
NSF · $234k · 2017–2020
Frequent coauthors
- 29 shared
Vitus J. Leung
Sandia National Laboratories
- 26 shared
Tajana Rosing
- 25 shared
Sherief Reda
- 20 shared
Burak Aksar
Boston University
- 20 shared
Manuel Egele
Boston University
- 18 shared
Emre Ateş
- 17 shared
Yvain Thonnart
CEA Grenoble
- 17 shared
David Atienza
École Polytechnique Fédérale de Lausanne
Labs
Education
- 2007
Ph.D., Computer Science
University of California, San Diego
- 2003
M.S., Computer Science
University of California, San Diego
- 2001
B.S., Computer Engineering
Bogazici University
Awards & honors
- IBM Faculty Award (IBM Global University Program Academic Aw…
- Invited participant at the National Academy of Engineering F…
- Best Artifact Award at the International European Conference…
- Ernest S. Kuh Early Career Award, IEEE Council on Electronic…
- Gauss Award at the International Supercomputing Conference –…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ayse Coskun
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup