Ayse Coskun

· Professor – Electrical and Computer Engineering Affiliated Faculty – Computer ScienceVerified

Boston University · Computer Science

Active 2006–2026

h-index34

Citations4.4k

Papers22772 last 5y

Funding$1.4M

Faculty page Lab page

See your match with Ayse Coskun — sign in to PhdFit.Sign in

About

Dr. Ayse Coskun is a Professor of Electrical and Computer Engineering in the College of Engineering at Boston University. She is also an affiliated faculty member in the Department of Computer Science and serves as the Director of the Center for Information and Systems Engineering. Her research interests include energy-efficient computing, cloud computing, high performance computing, computer architecture, and embedded systems. Dr. Coskun holds additional affiliations with the Hariri Institute and the Division of Systems Engineering. She has received numerous honors and awards, including the IBM Faculty Award in 2020, the NSF CAREER Award from 2012 to 2017, and the Ernest S. Kuh Early Career Award from the IEEE Council on Electronic Design Automation in 2017. Her editorial roles include Deputy Editor-in-Chief of the IEEE Transactions on Computer Aided Design since 2022, and associate editor positions for several prominent journals such as ACM Transactions on Architecture and Code Optimization, IEEE Transactions on Computers, and Elsevier Microelectronics Journal. Dr. Coskun earned her PhD from the University of California, San Diego, and is actively involved in advancing research in her fields of expertise.

Research topics

Computer Science
Materials science
Machine Learning
Engineering
Data Mining
Electrical engineering
Embedded system
Programming language
Operating system
Computer network
Distributed computing
Electronic engineering
Telecommunications
Computer architecture
Optoelectronics
Nanotechnology

Selected publications

Optimizing Workload Migration for Carbon and Cost Reductions Under Grid Constraints: New Insights and a Practical Evaluation Framework
IEEE Energy Sustainability Magazine · 2026-01-28
articleSenior author
As cloud-scale artificial intelligence (AI) and data processing workloads continue to surge, data center operators are increasingly exploring workload migration strategies to optimize energy costs and reduce environmental impact. While prior work has addressed cost- or carbon-aware migration independently, fewer studies have incorporated power grid strain as an operational constraint alongside these objectives. This article introduces a multiobjective optimization framework for cross-regional workload migration that minimizes electricity cost and carbon emissions, while enforcing strict constraints on grid strain and workload performance. Our approach uses real-world data from U.S. independent system operators (ISOs) to evaluate the feasibility and effectiveness of migrating workloads across regions under these combined considerations. We model migration overheads, including penalties from latency and runtime extension, and show how they influence the net benefit of shifting workloads. Our experiments show that up to 13% carbon and up to 15% cost reductions can be achieved while ensuring that no destination region exceeds safe grid operating thresholds. Furthermore, we highlight temporal and seasonal patterns that impact migration opportunities. This work provides new insights for sustainable workload migration strategies in geodistributed infrastructures and introduces a practical framework for evaluating carbon- and cost-aware migration decisions under explicit grid strain constraints.
Publisher DOI
Praxium: Diagnosing Cloud Anomalies with AI-based Telemetry and Dependency Analysis
ArXiv.org · 2026-03-25
articleOpen accessSenior author
As the modern microservice architecture for cloud applications grows in popularity, cloud services are becoming increasingly complex and more vulnerable to misconfiguration and software bugs. Traditional approaches rely on expert input to diagnose and fix microservice anomalies, which lacks scalability in the face of the continuous integration and continuous deployment (CI/CD) paradigm. Microservice rollouts, containing new software installations, have complex interactions with the components of an application. Consequently, this added difficulty in attributing anomalous behavior to any specific installation or rollout results in potentially slower resolution times. To address the gaps in current diagnostic methods, this paper introduces Praxium, a framework for anomaly detection and root cause inference. Praxium aids administrators in evaluating target metric performance in the context of dependency installation information provided by a software discovery tool, PraxiPaaS. Praxium continuously monitors telemetry data to identify anomalies, then conducts root cause analysis via causal impact on recent software installations, in order to provide site reliability engineers (SRE) relevant information about an observed anomaly. In this paper, we demonstrate that Praxium is capable of effective anomaly detection and root cause inference, and we provide an analysis on effective anomaly detection hyperparameter tuning as needed in a practical setting. Across 75 total trials using four synthetic anomalies, anomaly detection consistently performs at >0.97 macro-F1. In addition, we show that causal impact analysis reliably infers the correct root cause of anomalies, even as package installations occur at increasingly shorter intervals.
Publisher OA PDF
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems
arXiv (Cornell University) · 2026-04-02
preprintOpen access
Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory utilization, as well as the power consumption of the Vienna ab initio Simulation Package (VASP), using the Slurm workload manager historical logs and GPU performance metrics collected by NVIDIA's Data Center GPU Manager (DCGM). VASP is a widely used materials science application on Perlmutter at NERSC, an HPE Cray EX system based on NVIDIA A100 GPUs. Using our insights from the resource utilization analysis of VASP applications, we propose a resource prediction framework to predict the average GPU power, maximum GPU utilization, and maximum GPU memory utilization values of heterogeneous HPC system applications to enable more efficient scheduling decisions and power-aware system operation. Our prediction framework consists of two stages: 1) using only the Slurm accounting logs as training data and 2) augmenting the training data with historical GPU profiling metrics collected with DCGM. The maximum GPU utilization predictions using only the Slurm submission features achieve up to 97% accuracy. Furthermore, features engineered from GPU-compute and memory activity metrics exhibit good correlations with average power utilization, and our runtime power usage prediction experiments result in up to 92% prediction accuracy. These findings demonstrate the effectiveness of DCGM metrics in capturing application characteristics and highlight their potential for developing predictive models to support dynamic power management in HPC systems.
Publisher DOI
Job Grouping Based Intelligent Resource Prediction Framework
Lecture notes in computer science · 2026-01-01
book-chapterSenior author
Publisher DOI
Lessons Learned from Anomaly Detection in Chameleon Cloud
2025-09-23
articleSenior author
Cloud computing has become integral to modern technology infrastructure, supporting a wide range of services from e-commerce to AI applications. Chameleon is a large-scale, configurable testbed designed to enable edge-to-cloud research through full bare-metal provisioning, virtualization, and diverse hardware resources, which is built on a leading open source cloud platform OpenStack. However, monitoring Chameleon’s heterogeneous infrastructure is challenging, particularly across Open-Stack services and hardware components. Traditional threshold-based alerting methods struggle to keep up with the scale and complexity of such environments. In this work, we present an anomaly detection framework for OpenStack services in the Chameleon Cloud. We curate and publish the first dataset of resource usage metrics collected from OpenStack control plane services. We evaluate four state-of-the-art unsupervised multivariate time series models, namely TranAD, Prodigy, USAD, and OmniAnomaly, on this dataset and share key insights from deploying them. Our findings indicate that for our use case, while all models achieve high F1 scores, training with three days of healthy data effectively balances training cost and detection accuracy.
Publisher DOI
Analyzing GPU Utilization in HPC Workloads: Insights from Large-Scale Systems
2025-07-18 · 3 citations
article
Publisher DOI
UniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data
ArXiv.org · 2025-12-18
articleOpen accessSenior author
Machine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and adoption, especially in high-stakes domains such as healthcare. To address this challenge, we introduce UniCoMTE, a model-agnostic framework for generating counterfactual explanations for multivariate time series classifiers. The framework identifies temporal features that most heavily influence a model's prediction by modifying the input sample and assessing its impact on the model's prediction. UniCoMTE is compatible with a wide range of model architectures and operates directly on raw time series inputs. In this study, we evaluate UniCoMTE's explanations on a time series ECG classifier. We quantify explanation quality by comparing our explanations' comprehensibility to comprehensibility of established techniques (LIME and SHAP) and assessing their generalizability to similar samples. Furthermore, clinical utility is assessed through a questionnaire completed by medical experts who review counterfactual explanations presented alongside original ECG samples. Results show that our approach produces concise, stable, and human-aligned explanations that outperform existing methods in both clarity and applicability. By linking model predictions to meaningful signal patterns, the framework advances the interpretability of deep learning models for real-world time series applications.
Publisher OA PDF
Why transparency matters for sustainable data centers and carbon-neutral artificial intelligence (AI)
iScience · 2025-10-06 · 8 citations
reviewOpen access
As artificial intelligence (AI) applications demand substantial computational power, their energy consumption and the attendant carbon footprint of data centers are accelerating at an alarming rate. Due to increasing demand, data centers could consume 9% of global electricity demand by 2030. However, the path toward more sustainable AI and carbon-neutral data centers is hindered by the lack of transparency in data sharing. Without access to operational data from data centers, researchers face limitations in developing effective solutions to minimize carbon emissions. Transparency is urgently needed to foster innovation, advance sustainability goals, and create practical strategies for reducing the environmental impact of AI and data centers. Transparency will help us to innovate around sustainability problems that are expected to become even harder to solve with the booming of the AI industry. Taking the necessary actions discussed in this article is essential to ensure a more sustainable future of the data center industry.
Publisher DOI
Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona
ArXiv.org · 2025-07-01
preprintOpen access
Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emerald Conductor--that transforms AI data centers into flexible grid resources that can efficiently and immediately harness existing power systems without massive infrastructure buildout. Conducted at a 256-GPU cluster running representative AI workloads within a commercial, hyperscale cloud data center in Phoenix, Arizona, the trial achieved a 25% reduction in cluster power usage for three hours during peak grid events while maintaining AI quality of service (QoS) guarantees. By orchestrating AI workloads based on real-time grid signals without hardware modifications or energy storage, this platform reimagines data centers as grid-interactive assets that enhance grid reliability, advance affordability, and accelerate AI's development.
Publisher OA PDF DOI
Fast Machine Learning Based Prediction for Temperature Simulation Using Compact Models
2025-03-31 · 3 citations
articleSenior author
As transistor densities increase, managing thermal challenges in 3D IC designs becomes more complex. Traditional methods like finite element methods and compact thermal models (CTMs) are computationally expensive, while existing machine learning (ML) models require large datasets and a long training time. To address these challenges with the ML models, we introduce a novel ML framework that integrates with CTMs to accelerate steady-state thermal simulations without needing large datasets. Our approach achieves up to 70 × speedup over state-of-the-art simulators, enabling real-time, high-resolution thermal simulations for 2D and 3D IC designs.<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup><sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>This research was partially funded by the NSF CCF 2131127 grant
Publisher DOI

Recent grants

CAREER: 3D Stacked Systems for Energy-Efficient Computing: Innovative Strategies in Modeling and Runtime Management
NSF · $450k · 2012–2017
SHF: Small: Reclaiming Dark Silicon via 2.5D Integrated Systems with Silicon Photonic Networks
NSF · $450k · 2017–2021
SHF: Small: Collaborative Research: Managing Thermal Integrity in Monolithic 3D Integrated Systems
NSF · $250k · 2019–2022
CI-New: Collaborative Research: Modeling the Next-Generation Hybrid Cooling Systems for High-Performance Processors
NSF · $234k · 2017–2020

Frequent coauthors

Vitus J. Leung
Sandia National Laboratories
29 shared
Tajana Rosing
26 shared
Sherief Reda
25 shared
Burak Aksar
Boston University
20 shared
Manuel Egele
Boston University
20 shared
Emre Ateş
18 shared
Yvain Thonnart
CEA Grenoble
17 shared
David Atienza
École Polytechnique Fédérale de Lausanne
17 shared

Labs

Data Mining & Data ManagementPI

Education

Ph.D., Computer Science
University of California, San Diego
2007
M.S., Computer Science
University of California, San Diego
2003
B.S., Computer Engineering
Bogazici University
2001

Awards & honors

IBM Faculty Award (IBM Global University Program Academic Aw…
Invited participant at the National Academy of Engineering F…
Best Artifact Award at the International European Conference…
Ernest S. Kuh Early Career Award, IEEE Council on Electronic…
Gauss Award at the International Supercomputing Conference –…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ayse Coskun

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you