Ewa Deelman

· Research Professor of Computer Science and Principal Scientist at USC Information Sciences InstituteVerified

University of Southern California · Thomas Lord Department of Computer Science

Active 1996–2026

h-index73

Citations25.2k

Papers544128 last 5y

Funding$24.2M2 active

Faculty page Lab page Website

See your match with Ewa Deelman — sign in to PhdFit.Sign in

About

Ewa Deelman is a Research Professor of Computer Science and a Principal Scientist at USC Information Sciences Institute. Her main area of research is distributed computing, focusing on supporting complex scientific applications across various computational environments such as campus clusters, grids, and clouds. She has designed new algorithms for job scheduling, resource provisioning, and data storage optimization within scientific workflows. Since 2000, she has conducted research in scientific workflows and led the development of the Pegasus software, which maps complex application workflows onto distributed resources and is utilized by a broad community of researchers in fields including astronomy, bioinformatics, earthquake science, gravitational-wave physics, and limnology. Deelman is also the Principal Investigator for the CI CoE pilot, providing leadership and support to cyberinfrastructure practitioners at NSF Major Facilities and throughout the research ecosystem. Her interests extend to distributed data management, high-level application monitoring, and resource provisioning in grids and clouds. She has co-edited a book on scientific workflows and authored numerous journal articles and conference publications. Additionally, she established the Workshop on Workflows in Support of Large-Scale Science (WORKS), an annual workshop in the field of scientific workflows.

Research topics

Computer Science
Artificial Intelligence
Computer Security
Data science
Risk analysis (engineering)
Engineering
World Wide Web
Database
Management science

Selected publications

From Disruption to Resilience: Adaptive Strategies in Big Science Organizations During a Global Pandemic
Management Communication Quarterly · 2026-04-06
articleSenior author
This qualitative study grounded in phronetic iterative analysis examines how big science organizations adapted to the COVID-19 pandemic while sustaining their scientific missions. Using the communication theory of resilience (CTR) as a guiding framework, infused with the literature on high-reliability organizations (HRO), the analysis draws from 56 semi-structured interviews across three phases (2020–2023) to identify 10 adaptive strategies linked to four of CTR’s five core processes. The fifth—affirming identity anchors—did not surface as an explicit strategy but operated implicitly, suggesting that when organizational identity aligns with crisis demands, explicit identity work may be unnecessary. Flexibility emerged as a central meta-process that shaped how strategies were implemented in context. Key strategies included adjusting work expectations, focusing on outcomes over time/place, and leveraging peer networks. This study contributes to CTR and HRO scholarship by emphasizing the contextual, communicative, and identity-sensitive nature of adaptive resilience during systemic disruption.
Publisher DOI
Guest editor’s note: Special issue on system-level innovations for performance and fairness at scale: From interconnects to schedulers
The International Journal of High Performance Computing Applications · 2026-03-01
articleSenior author
Publisher DOI
AeroResQ: Edge-accelerated UAV framework for scalable, resilient and collaborative escape route planning in wildfire scenarios
Future Generation Computer Systems · 2026-05-07
articleOpen accessSenior author
Publisher OA PDF DOI
A Terminology for Scientific Workflow Systems
Research Explorer (The University of Manchester) · 2025-06-09
preprintOpen access
The term scientific workflow has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow management systems (WMSs) have been developed to manage and run these workflows. However, no turnkey solution has emerged to address the diversity of scientific processes and the infrastructure on which they are implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new WMS. A direct consequence is that many existing WMSs share some salient features, offer similar functionalities, and can manage the same categories of workflows but also have some distinct capabilities. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.
Publisher DOI
A terminology for scientific workflow systems
Future Generation Computer Systems · 2025-06-24 · 12 citations
articleOpen access
Publisher OA PDF DOI
Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling
2025-11-07 · 2 citations
articleOpen access
High-Performance Computing (HPC) job scheduling involves balancing conflicting objectives such as minimizing makespan, reducing wait times, optimizing resource use, and ensuring fairness. Traditional methods, including heuristic-based, e.g., First-Come-First-Served(FJFS) and Shortest Job First (SJF), or intensive optimization techniques, often lack adaptability to dynamic workloads and, more importantly, cannot simultaneously optimize multiple objectives in HPC systems. To address this, we propose a novel Large Language Model (LLM)-based scheduler using a ReAct-style framework (Reason + Act), enabling iterative, interpretable decision-making. The system incorporates a scratchpad memory to track scheduling history and refine decisions via natural language feedback, while a constraint enforcement module ensures feasibility and safety. We evaluate our approach using OpenAI’s O4-Mini and Anthropic’s Claude 3.7 across seven real-world HPC workload scenarios, including heterogeneous mixes, bursty patterns, and adversarial cases etc. Comparisons against FCFS, SJF, and Google OR-Tools (on 10 to 100 jobs) reveal that LLM-based scheduling effectively balances multiple objectives while offering transparent reasoning through natural language traces. The method excels in constraint satisfaction and adapts to diverse workloads without domain-specific training. However, a trade-off between reasoning quality and computational overhead challenges real-time deployment. This work presents the first comprehensive study of reasoning-capable LLMs for HPC scheduling, demonstrating their potential to handle multiobjective optimization while highlighting limitations in computational efficiency. The findings provide insights into leveraging advanced language models for complex scheduling problems in dynamic HPC environments.
Publisher DOI
Determining Levels of Detail for Simulators of Parallel and Distributed Computing Systems via Automated Calibration
2025-11-07
article
There are two sources of inaccuracy when simulating parallel and distributed computing systems: (i) a simulator implemented at an insufficient level of detail; and (ii) incorrectly calibrated simulation parameter values. Increasing the simulator’s level of detail can improve accuracy, but at the cost of higher space, time, and/or software complexity. Furthermore, evaluating the intrinsic accuracy of a simulator requires that its parameters be well-calibrated. Making decisions regarding the level of detail is thus challenging. We propose a methodology for instantiating the simulation calibration process and a framework for automating this process, which makes it possible to pick appropriate levels of detail for any simulator. We demonstrate the usefulness of our approach via two case studies for two different domains.
Publisher DOI
Building Resilience: Lessons Learned by Big Science Organizations During the Pandemic
International Crisis and Risk Communication Association Reports · 2025-07-01
articleOpen access
The COVID-19 pandemic challenged organizations worldwide, offering important lessons about resilience and adaptation. This study focuses on resilience of professionals working in Big Science Organizations (BSOs), specifically National Science Foundation (NSF)-funded Major Facilities (MFs), Mid-Scale Research Infrastructures (MSRIs), and other related large research infrastructures. These organizations, such as the U.S. Academic Research Fleet and the Green Bank Observatory, play a critical role in advancing science and had to quickly adjust to pandemic disruptions. Between December 2020 and August 2023, we conducted 56 interviews across three phases. Phase 1 explored early responses (n=13), Phase 2 examined long-term adaptations (n=17), and Phase 3 involved member-checking validation (n=26). Using grounded theory for analysis and participant ratings for validation, we identified five key lessons: (1) pivoting to hybrid work is possible; (2) organizations must respect employees’ personal and family needs; (3) workplaces should offer and promote mental health and counseling resources; (4) not all employees will comply with policies that threaten personal freedoms; and (5) pandemic preparedness planning is essential. These findings, confirmed across phases, highlight the importance of flexibility, employee well-being, and proactive crisis planning. They offer a practical framework to strengthen resilience for big science organizations facing future global disruptions.
Publisher OA PDF DOI
Impact of a Cyberinfrastructure Fellowship Program for Undergraduates
2025-07-18
articleSenior author
The CI Compass Fellowship Program (CICF) was developed to increase undergraduate student participation in cyberinfrastructure (CI) and Major and Midscale Facilities (MFs) research, development, and operations. CICF is strategically structured to provide students with an understanding and experience in CI and MF-related fields and to illuminate to students the possibility of CI and MF-related career pathways, thus potentially contributing to the CI and MF workforce. This poster describes the program's structure, the impact of the program's first four years, the evaluation structure and findings, and future plans.
Publisher DOI
SWARM: Reimagining scientific workflow management systems in a distributed world
The International Journal of High Performance Computing Applications · 2025-05-15 · 3 citations
articleCorresponding
Modern scientific workflows process massive amounts of data from diverse instruments and sensors, leveraging geographically distributed, heterogeneous compute and storage resources—from leadership-class systems to edge devices—connected by high-performance networks. The diversity of resources introduces challenges in harnessing their full potential, with resilience issues arising across applications, system software, networks, storage, and hardware. Today, workflow management systems (WMS) coordinate the execution of computation and data management tasks across target resources. However, WMS’s centralized nature makes them vulnerable to faults and scalability issues that may result in failures of entire computational campaigns. This paper introduces a novel agentic framework for workflow management, fully distributing and decentralizing the WMS functions and modeling them as swarm intelligence agents infused with advanced artificial intelligence solutions and traditional distributed computing algorithms that can make coordinated decisions in the presence of failures of the underlying cyberinfrastructure.
Publisher DOI

Recent grants

STCI: Middleware for Monitoring and Troubleshooting of Large-Scale Applications on National Cyberinfrastructure
NSF · $1.9M · 2009–2013
Collaborative Research: EAGER: VisDict - Visual Dictionaries for Enhancing the Communication between Domain Scientists and Scientific Workflow Providers
NSF · $100k · 2021–2023
SI2-SSI: Pegasus: Automating Compute and Data Intensive Science
NSF · $2.5M · 2017–2023
Collaborative Research: PPoSS: Planning: Performance Scalability, Trust, and Reproducibility: A Community Roadmap to Robust Science in High-throughput Applications
NSF · $70k · 2020–2022
Designing Scientific Software one Workflow at a Time
NSF · $374k · 2007–2011

Frequent coauthors

Karan Vahi
University of Southern California
137 shared
Rafael Ferreira da Silva
Oak Ridge National Laboratory
114 shared
Gideon Juve
94 shared
Gaurang Mehta
Institute and Faculty of Actuaries
92 shared
Mats Rynge
University of Chicago
78 shared
Anirban Mandal
University of North Carolina at Chapel Hill
77 shared
Carl Kesselman
University of Southern California
73 shared
G. Bruce Berriman
72 shared

Awards & honors

2006 e-Science Best Paper
2001 15th Workshop on Parallel and Distributed Simulation Be…
1987 Wells College Special Distinction in the field of Mathe…
AAAS fellow
IEEE fellow

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ewa Deelman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you