
Ewa Deelman
· Research Professor of Computer Science and Principal Scientist at USC Information Sciences InstituteVerifiedUniversity of Southern California · Thomas Lord Department of Computer Science
Active 1996–2026
About
Ewa Deelman is a Research Professor of Computer Science and a Principal Scientist at USC Information Sciences Institute. Her main area of research is distributed computing, focusing on supporting complex scientific applications across various computational environments such as campus clusters, grids, and clouds. She has designed new algorithms for job scheduling, resource provisioning, and data storage optimization within scientific workflows. Since 2000, she has conducted research in scientific workflows and led the development of the Pegasus software, which maps complex application workflows onto distributed resources and is utilized by a broad community of researchers in fields including astronomy, bioinformatics, earthquake science, gravitational-wave physics, and limnology. Deelman is also the Principal Investigator for the CI CoE pilot, providing leadership and support to cyberinfrastructure practitioners at NSF Major Facilities and throughout the research ecosystem. Her interests extend to distributed data management, high-level application monitoring, and resource provisioning in grids and clouds. She has co-edited a book on scientific workflows and authored numerous journal articles and conference publications. Additionally, she established the Workshop on Workflows in Support of Large-Scale Science (WORKS), an annual workshop in the field of scientific workflows.
Research topics
- Computer Science
- Artificial Intelligence
- Computer Security
- Data science
- Risk analysis (engineering)
- Engineering
- World Wide Web
- Database
- Management science
Selected publications
Management Communication Quarterly · 2026-04-06
articleSenior authorThis qualitative study grounded in phronetic iterative analysis examines how big science organizations adapted to the COVID-19 pandemic while sustaining their scientific missions. Using the communication theory of resilience (CTR) as a guiding framework, infused with the literature on high-reliability organizations (HRO), the analysis draws from 56 semi-structured interviews across three phases (2020–2023) to identify 10 adaptive strategies linked to four of CTR’s five core processes. The fifth—affirming identity anchors—did not surface as an explicit strategy but operated implicitly, suggesting that when organizational identity aligns with crisis demands, explicit identity work may be unnecessary. Flexibility emerged as a central meta-process that shaped how strategies were implemented in context. Key strategies included adjusting work expectations, focusing on outcomes over time/place, and leveraging peer networks. This study contributes to CTR and HRO scholarship by emphasizing the contextual, communicative, and identity-sensitive nature of adaptive resilience during systemic disruption.
The International Journal of High Performance Computing Applications · 2026-03-01
articleSenior authorFuture Generation Computer Systems · 2026-05-07
articleOpen accessSenior authorA Terminology for Scientific Workflow Systems
Research Explorer (The University of Manchester) · 2025-06-09
preprintOpen accessThe term scientific workflow has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow management systems (WMSs) have been developed to manage and run these workflows. However, no turnkey solution has emerged to address the diversity of scientific processes and the infrastructure on which they are implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new WMS. A direct consequence is that many existing WMSs share some salient features, offer similar functionalities, and can manage the same categories of workflows but also have some distinct capabilities. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.
A terminology for scientific workflow systems
Future Generation Computer Systems · 2025-06-24 · 12 citations
articleOpen accessEvaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling
2025-11-07 · 2 citations
articleOpen accessHigh-Performance Computing (HPC) job scheduling involves balancing conflicting objectives such as minimizing makespan, reducing wait times, optimizing resource use, and ensuring fairness. Traditional methods, including heuristic-based, e.g., First-Come-First-Served(FJFS) and Shortest Job First (SJF), or intensive optimization techniques, often lack adaptability to dynamic workloads and, more importantly, cannot simultaneously optimize multiple objectives in HPC systems. To address this, we propose a novel Large Language Model (LLM)-based scheduler using a ReAct-style framework (Reason + Act), enabling iterative, interpretable decision-making. The system incorporates a scratchpad memory to track scheduling history and refine decisions via natural language feedback, while a constraint enforcement module ensures feasibility and safety. We evaluate our approach using OpenAI’s O4-Mini and Anthropic’s Claude 3.7 across seven real-world HPC workload scenarios, including heterogeneous mixes, bursty patterns, and adversarial cases etc. Comparisons against FCFS, SJF, and Google OR-Tools (on 10 to 100 jobs) reveal that LLM-based scheduling effectively balances multiple objectives while offering transparent reasoning through natural language traces. The method excels in constraint satisfaction and adapts to diverse workloads without domain-specific training. However, a trade-off between reasoning quality and computational overhead challenges real-time deployment. This work presents the first comprehensive study of reasoning-capable LLMs for HPC scheduling, demonstrating their potential to handle multiobjective optimization while highlighting limitations in computational efficiency. The findings provide insights into leveraging advanced language models for complex scheduling problems in dynamic HPC environments.
2025-11-07
articleThere are two sources of inaccuracy when simulating parallel and distributed computing systems: (i) a simulator implemented at an insufficient level of detail; and (ii) incorrectly calibrated simulation parameter values. Increasing the simulator’s level of detail can improve accuracy, but at the cost of higher space, time, and/or software complexity. Furthermore, evaluating the intrinsic accuracy of a simulator requires that its parameters be well-calibrated. Making decisions regarding the level of detail is thus challenging. We propose a methodology for instantiating the simulation calibration process and a framework for automating this process, which makes it possible to pick appropriate levels of detail for any simulator. We demonstrate the usefulness of our approach via two case studies for two different domains.
Building Resilience: Lessons Learned by Big Science Organizations During the Pandemic
International Crisis and Risk Communication Association Reports · 2025-07-01
articleOpen accessThe COVID-19 pandemic challenged organizations worldwide, offering important lessons about resilience and adaptation. This study focuses on resilience of professionals working in Big Science Organizations (BSOs), specifically National Science Foundation (NSF)-funded Major Facilities (MFs), Mid-Scale Research Infrastructures (MSRIs), and other related large research infrastructures. These organizations, such as the U.S. Academic Research Fleet and the Green Bank Observatory, play a critical role in advancing science and had to quickly adjust to pandemic disruptions. Between December 2020 and August 2023, we conducted 56 interviews across three phases. Phase 1 explored early responses (n=13), Phase 2 examined long-term adaptations (n=17), and Phase 3 involved member-checking validation (n=26). Using grounded theory for analysis and participant ratings for validation, we identified five key lessons: (1) pivoting to hybrid work is possible; (2) organizations must respect employees’ personal and family needs; (3) workplaces should offer and promote mental health and counseling resources; (4) not all employees will comply with policies that threaten personal freedoms; and (5) pandemic preparedness planning is essential. These findings, confirmed across phases, highlight the importance of flexibility, employee well-being, and proactive crisis planning. They offer a practical framework to strengthen resilience for big science organizations facing future global disruptions.
Impact of a Cyberinfrastructure Fellowship Program for Undergraduates
2025-07-18
articleSenior authorThe CI Compass Fellowship Program (CICF) was developed to increase undergraduate student participation in cyberinfrastructure (CI) and Major and Midscale Facilities (MFs) research, development, and operations. CICF is strategically structured to provide students with an understanding and experience in CI and MF-related fields and to illuminate to students the possibility of CI and MF-related career pathways, thus potentially contributing to the CI and MF workforce. This poster describes the program's structure, the impact of the program's first four years, the evaluation structure and findings, and future plans.
SWARM: Reimagining scientific workflow management systems in a distributed world
The International Journal of High Performance Computing Applications · 2025-05-15 · 3 citations
articleCorrespondingModern scientific workflows process massive amounts of data from diverse instruments and sensors, leveraging geographically distributed, heterogeneous compute and storage resources—from leadership-class systems to edge devices—connected by high-performance networks. The diversity of resources introduces challenges in harnessing their full potential, with resilience issues arising across applications, system software, networks, storage, and hardware. Today, workflow management systems (WMS) coordinate the execution of computation and data management tasks across target resources. However, WMS’s centralized nature makes them vulnerable to faults and scalability issues that may result in failures of entire computational campaigns. This paper introduces a novel agentic framework for workflow management, fully distributing and decentralizing the WMS functions and modeling them as swarm intelligence agents infused with advanced artificial intelligence solutions and traditional distributed computing algorithms that can make coordinated decisions in the presence of failures of the underlying cyberinfrastructure.
Recent grants
NSF · $1.9M · 2009–2013
NSF · $100k · 2021–2023
SI2-SSI: Pegasus: Automating Compute and Data Intensive Science
NSF · $2.5M · 2017–2023
NSF · $70k · 2020–2022
Designing Scientific Software one Workflow at a Time
NSF · $374k · 2007–2011
Frequent coauthors
- 137 shared
Karan Vahi
University of Southern California
- 114 shared
Rafael Ferreira da Silva
Oak Ridge National Laboratory
- 94 shared
Gideon Juve
- 92 shared
Gaurang Mehta
Institute and Faculty of Actuaries
- 78 shared
Mats Rynge
University of Chicago
- 77 shared
Anirban Mandal
University of North Carolina at Chapel Hill
- 73 shared
Carl Kesselman
University of Southern California
- 72 shared
G. Bruce Berriman
Awards & honors
- 2006 e-Science Best Paper
- 2001 15th Workshop on Parallel and Distributed Simulation Be…
- 1987 Wells College Special Distinction in the field of Mathe…
- AAAS fellow
- IEEE fellow
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ewa Deelman
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup