Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Ian Foster

Ian Foster

· Professor of Computer ScienceVerified

University of Chicago · Computer Science

Active 1980–2026

h-index150
Citations139.5k
Papers1.7k516 last 5y
Funding$25.5M2 active
See your match with Ian Foster — sign in to PhdFit.Sign in

About

Ian Foster is an Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. His research focuses on the development and application of computer science principles, particularly in the areas of scientific computing, data science, and high-performance computing. Foster's work involves exploring innovative computational paradigms and advancing the foundations of data-driven scientific discovery. He is recognized for his contributions to the field of computer science, especially in the context of scientific research and data-intensive applications. Foster's leadership and research have significantly impacted how complex scientific computations are performed and how data is managed and analyzed in interdisciplinary settings, fostering collaboration across academia, industry, and government sectors.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Political Science
  • Biology
  • Data science
  • Economics
  • Database
  • Engineering
  • Data Mining
  • Knowledge management
  • Engineering ethics
  • Ecology
  • Machine Learning
  • Computational biology
  • Biochemistry
  • Business
  • World Wide Web
  • Forestry
  • Algorithm
  • Genetics
  • Chemistry
  • Agroforestry
  • Medicine
  • Law and economics

Selected publications

  • Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

    arXiv (Cornell University) · 2026-04-11

    articleOpen accessSenior author

    Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.

  • Automated, reliable, and efficient continental-scale replication of 7.3 petabytes of computational simulation data: A case study

    The International Journal of High Performance Computing Applications · 2026-04-25

    articleSenior author

    We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) computational simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee—a task motivated by a need for increased reliability, capacity, and performance. This task presented significant challenges: the need to move 29 million files twice under time pressure from aging storage hardware; a source file system bottleneck limiting throughput to 1.5 GB/s; frequent site maintenance windows; and the need for complete reliability at scale. We addressed these challenges using a simple replication tool that invoked Globus to transfer large bundles of files while tracking progress in a database, dynamically rerouting transfers to work around maintenance periods and file system limitations. Under the covers, Globus organized transfers to make efficient use of the high-speed Energy Sciences network (ESnet) and the data transfer nodes deployed at participating sites, and also addressed security, integrity checking, and recovery from a variety of transient failures. This success demonstrates the considerable benefits that can accrue from the adoption of performant data replication infrastructure. The replication tool is available at https://github.com/esgf2-us/data-replication-tools .

  • Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

    arXiv (Cornell University) · 2026-04-11

    preprintOpen accessSenior author

    Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.

  • FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access

    2025-11-07 · 1 citations

    article

    We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs), on existing HPC infrastructure. Leveraging Globus Auth and Globus Compute, the system allows researchers to run parallel inference workloads via an OpenAI-compliant API on private, secure environments. This cluster-agnostic API allows requests to be distributed across federated clusters, targeting numerous hosted models. FIRST supports multiple inference backends (e.g., vLLM), auto-scales resources, maintains "hot" nodes for low-latency execution, and offers both high-throughput batch and interactive modes. The framework addresses the growing demand for private, secure, and scalable AI inference in scientific workflows, allowing researchers to generate billions of tokens daily on-premises without relying on commercial cloud infrastructure.

  • Radio Afterglow Detection and AI-driven Response (RADAR): A Federated Framework for Gravitational-wave Event Follow-up

    The Astrophysical Journal Supplement Series · 2025-10-01 · 2 citations

    articleOpen access

    Abstract The landmark detection of both gravitational waves (GWs) and electromagnetic (EM) radiation from the binary neutron star merger GW170817 has spurred efforts to streamline the follow-up of GW alerts in current and future observing runs of ground-based GW detectors. Within this context, the radio band of the EM spectrum presents unique challenges. Sensitive radio facilities capable of detecting the faint radio afterglow seen in GW170817, and with sufficient angular resolution, have small fields of view compared to typical GW localization areas. Additionally, theoretical models predict that the radio emission from binary neutron star mergers can evolve over weeks to years, necessitating long-term monitoring to probe the physics of the various postmerger ejecta components. These constraints, combined with limited radio observing resources, make the development of more coordinated follow-up strategies essential—especially as the next generation of GW detectors promises a dramatic increase in detection rates. Here, we present RADAR , a framework designed to address these challenges by promoting community-driven information sharing, federated data analysis, and system resilience, while integrating AI methods for both GW signal identification and radio data aggregation. We show that it is possible to preserve data rights while sharing models that can help design and/or update follow-up strategies. We demonstrate our approach through a case study of GW170817, and discuss future directions for refinement and broader application.

  • Addressing Reproducibility Challenges in HPC with Continuous Integration

    2025-11-12 · 1 citations

    articleOpen access

    The high-performance computing (HPC) community has adopted incentive structures to motivate reproducible research, with major conferences awarding badges to papers that meet reproducibility requirements. Yet, many papers do not meet such requirements. The uniqueness of HPC infrastructure and software, coupled with strict access requirements, may limit opportunities for reproducibility. In the absence of resource access, we believe that regular documented testing, through continuous integration (CI), coupled with complete provenance information, can be used as a substitute. Here, we argue that better HPC-compliant CI solutions will improve reproducibility of applications. We present a survey of reproducibility initiatives and describe the barriers to reproducibility in HPC. To address existing limitations, we present a GitHub Action, CORRECT, that enables secure execution of tests on remote HPC resources. We evaluate CORRECT’s usability across three different types of HPC applications, demonstrating the effectiveness of using CORRECT for automating and documenting reproducibility evaluations.

  • Diamond: Harnessing GPU Resources for Scientific Deep Learning

    2025-09-15

    article

    Modern research computing cyberinfrastructure, such as ACCESS-CI and NAIRR Pilot, offers GPU resources across geographically distributed clusters to accommodate the increasing needs of scientific deep learning (DL) workloads. Even for high-performance computing (HPC) experts, configuring environments and managing DL workloads across supercomputers remain significant barriers. To address these obstacles, we present Diamond, an open-source platform to simplify and streamline the DL lifecycle on HPC. Diamond provides an intuitive graphical interface that abstracts system-level complexity, enabling users to develop, debug, and deploy DL models with minimal overhead. We identify several challenges in building such a platform, including portability, security, and usability, and propose effective architectural solutions to each. Notably, Diamond enables users to share and reuse DL workload environments across systems and collaborators, reducing redundant setup efforts. Experimental results demonstrate that Diamond reduces the time to first successful deployment by an average of 68%, compared to manual configuration with command lines. The Diamond service is available at https://diamondhpc.ai.

  • FragmentGPT: A Unified GPT Model for Fragment Growing, Linking, and Merging in Molecular Design

    ArXiv.org · 2025-09-14

    preprintOpen access

    Fragment-Based Drug Discovery (FBDD) is a popular approach in early drug development, but designing effective linkers to combine disconnected molecular fragments into chemically and pharmacologically viable candidates remains challenging. Further complexity arises when fragments contain structural redundancies, like duplicate rings, which cannot be addressed by simply adding or removing atoms or bonds. To address these challenges in a unified framework, we introduce FragmentGPT, which integrates two core components: (1) a novel chemically-aware, energy-based bond cleavage pre-training strategy that equips the GPT-based model with fragment growing, linking, and merging capabilities, and (2) a novel Reward Ranked Alignment with Expert Exploration (RAE) algorithm that combines expert imitation learning for diversity enhancement, data selection and augmentation for Pareto and composite score optimality, and Supervised Fine-Tuning (SFT) to align the learner policy with multi-objective goals. Conditioned on fragment pairs, FragmentGPT generates linkers that connect diverse molecular subunits while simultaneously optimizing for multiple pharmaceutical goals. It also learns to resolve structural redundancies-such as duplicated fragments-through intelligent merging, enabling the synthesis of optimized molecules. FragmentGPT facilitates controlled, goal-driven molecular assembly. Experiments and ablation studies on real-world cancer datasets demonstrate its ability to generate chemically valid, high-quality molecules tailored for downstream drug discovery tasks.

  • Strategic investments in data democratization for scientific innovation

    The International Journal of High Performance Computing Applications · 2025-09-30

    articleSenior author

    The urgent need for data democratization in scientific research was the focal point of a panel discussion at the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), held in Denver, Colorado, from November 12 to 17, 2023, summarizing the outcomes of that discussion and subsequent conversations. The panelists advocated for strategic investments in financial, human, and technological resources to achieve sustainable data democratization. Emphasizing that data is central to scientific discovery and AI deployment, the panel highlighted barriers such as limited access, inadequate financial incentives for cross-domain collaboration, and a shortage of workforce development initiatives. The recommendations in this article aim to guide decision-makers in fostering an inclusive research community, breaking down research silos, and developing a skilled workforce to advance scientific discovery through data democratization.

  • LangChain-Parsl: Connect Large Language Model Agents to High Performance Computing Resource

    2025-11-07 · 2 citations

    articleOpen access

    Large Language Models (LLMs) can improve performance in answering questions beyond their contextual understanding by running external tools, such as a calculator for arithmetics, an online query for real-time weather, et al. For scientific applications, this enables the LLM to perform and analyze simulation runs for more accurate answers. However, the increasing scale of scientific computing requires high-performance computers (HPCs), which are managed by job schedulers. In this work, we implemented Parsl to the LangChain tool calling to bridge the gap between the LLM agent and the HPC resource. Two implementations were set up and tested on a local Nvidia GPU workstation and the Polaris/ALCF HPC system. The first setup was implemented by modifying the LangChain tool calling, which converts the LangChain tool calls to Parsl functions and queues them to the Parsl workers for parallel execution. The second approach was achieved by designing a Parsl ensemble function as an LLM tool, which performed parallel tasks. With these implementations, the LLM agent workflow was prompted to run molecular dynamics simulations, with different protein structures and simulation conditions. The results show that our Parsl implementations enable parallel execution of scientific tools that invoked by LLM agents on both local GPU workstations and HPC platforms.

Recent grants

Frequent coauthors

Labs

Education

  • B.S., Computer Science

    University of Canterbury

  • Ph.D., Computer Science

    Imperial College

Awards & honors

  • 2024 HPCWired 35 Legends List
  • 2023 IEEE internet award
  • 2022 ACM/IEEE Ken Kennedy award
  • 2020 DOE office of science distinguished scientists fellow
  • 2019 IEEE-CS Charles Babbage award
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Ian Foster

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup