Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Zachary Ives

Zachary Ives

· Assistant ProfessorVerified

University of Pennsylvania · Computer and Information Science

Active 1998–2025

h-index40
Citations10.7k
Papers17015 last 5y
Funding$4.4M
See your match with Zachary Ives — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • Information Retrieval
  • Programming language
  • Computer Security
  • Data Mining
  • Database
  • Artificial Intelligence
  • Natural Language Processing
  • Archaeology
  • Theoretical computer science
  • Geology
  • Mathematics
  • Engineering
  • Data science

Selected publications

  • Implementing Views for Property Graphs

    ACM SIGMOD Record · 2025-04-28 · 1 citations

    articleSenior author

    Property graph databases are increasingly used to integrate heterogeneous data, motivating graph views to abstract, simplify, and unify the data, e.g., to capture individual-level vs. organization-level relationships. This paper considers the tasks of implementing such views using rewriting techniques — both using existing property graph DBMSs and converting to relational RDBMSs. We consider both virtual and materialized views, ways of rewriting queries, and structures for indexing data. We also note a common use case of graph views, which involves preserving a graph except minor local transformations; we develop novel extensions and semantics for this. We evaluate and compare the performance of our techniques under a variety of workloads, and we compare existing graph and relational DBMS platforms.

  • Low Rank Learning for Offline Query Optimization

    Proceedings of the ACM on Management of Data · 2025-06-17 · 2 citations

    articleOpen access

    Recent deployments of learned query optimizers use expensive neural networks and ad-hoc search policies. To address these issues, we introduce LimeQO, a framework for offline query optimization leveraging low-rank learning to efficiently explore alternative query plans with minimal resource usage. By modeling the workload as a partially observed, low-rank matrix, we predict unobserved query plan latencies using purely linear methods, significantly reducing computational overhead compared to neural networks. We formalize offline exploration as an active learning problem, and present simple heuristics that reduces a 3-hour workload to 1.5 hours after just 1.5 hours of exploration. Additionally, we propose a transductive Tree Convolutional Neural Network (TCNN) that, despite higher computational costs, achieves the same workload reduction with only 0.5 hours of exploration. Unlike previous approaches that place expensive neural networks directly in the query processing ''hot'' path, our approach offers a low-overhead solution and a no-regressions guarantee, all without making assumptions about the underlying DBMS.

  • Data-Agnostic Cardinality Learning from Imperfect Workloads

    Proceedings of the VLDB Endowment · 2025-04-01

    articleOpen accessSenior author

    Cardinality estimation (CardEst) is a critical aspect of query optimization. Traditionally, it leverages statistics built directly over the data. However, organizational policies (e.g., regulatory compliance) may restrict global data access. Fortunately, query-driven cardinality estimation can learn CardEst models using query workloads. However, existing query-driven models often require access to data or summaries for best performance, and they assume perfect training workloads with complete and balanced join templates (or join graphs). Such assumptions rarely hold in real-world scenarios, in which join templates are incomplete and imbalanced. We present GRASP, a data-agnostic cardinality learning system designed to work under these real-world constraints. GRASP's compositional design generalizes to unseen join templates and is robust to join template imbalance. It also introduces a new pertable CardEst model that handles value distribution shifts for range predicates, and a novel learned count sketch model that captures join correlations across base relations. Across three database instances, we demonstrate that GRASP consistently outperforms existing query-driven models on imperfect workloads, both in terms of estimation accuracy and query latency. Remarkably, GRASP achieves performance comparable to, or even surpassing, traditional approaches built over the underlying data on the complex CEB-IMDb-full benchmark — despite operating without any data access and using only 10% of all possible join templates.

  • A Practical Theory of Generalization in Selectivity Learning

    Proceedings of the VLDB Endowment · 2025-02-01

    articleSenior author

    Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we aim to bridge the gaps between theory and practice. First, we demonstrate that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory. More importantly, beyond the PAC learning framework (which only allows us to characterize how the model behaves when both training and test workloads are drawn from the same distribution), we establish, under mild assumptions, that selectivity predictors from this class exhibit favorable out-of-distribution (OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the in-distribution and OOD generalization capabilities of query-driven selectivity learning, and facilitate the design of two general strategies to improve OOD generalization for existing query-driven selectivity models. We empirically verify that our techniques help query-driven selectivity models generalize significantly better to OOD queries both in terms of prediction accuracy and query latency performance, while maintaining their superior in-distribution generalization performance.

  • QuoteInspector: Gaining Insight about Social Media Discussions

    Proceedings of the VLDB Endowment · 2024-08-01

    articleSenior author

    Our greatest source of insight into the real world today is via social media. Here, a major statement or quote by a public figure (world leader, politician, celebrity, scientist) can have wide-ranging impact, igniting extensive discussions and triggering reactions. It would be helpful to have tools for monitoring, querying, and inspecting the "flow" of social discourse. We introduce QuoteInspector, a system uniquely designed for efficient tracking and analysis of social media discussions around quotes. QuoteInspector leverages modern text embeddings and employs a clustering-based methodology for extracting topics from posts; it further integrates various NLP techniques for in-depth cluster analysis. Additionally, the system enhances the user experience by combining keyword- and relationship-based (structured) search for efficient and precise quote retrieval.

  • Searching Data Lakes for Nested and Joined Data

    Proceedings of the VLDB Endowment · 2024-07-01

    articleSenior author

    Exploratory data science is driving new platforms that assist data scientists with everyday tasks, such as integration and wrangling, to assemble training datasets. Such tools take scientists' work-in-progress data as a search object (table or JSON) and find relevant supplementary data from an organizational data lake , which can be unioned or joined with the current data. Existing data lake search tools find single , relational tables to match or join with a search object. Yet many data science applications revolve around hierarchical data, which can only be matched by creating views that simultaneously join and transform several tables in the data lake. In this paper, we extend the Juneau data lake search system [46] for this broader class of matches at scale. Our contribution is a general framework for efficiently merging ranked results to match hierarchical data, leveraging novel techniques for indexing and sketching, and incorporating existing single-table search techniques and ranking functions. We experimentally validate our methods' benefits and broad applicability using real data from data science computational notebooks. Our results indicate that, with different ranking functions, our approach can return the optimal set of views up to 4.8x faster and 43% more related compared to heuristics, and increase the data domain coverage by up to 28%. In a case study to show the utility of our results to data science downstream tasks, we reduce regression error by up to 6.6%, and improve classification accuracy by up to 19.5%.

  • A Practical Theory of Generalization in Selectivity Learning

    ArXiv.org · 2024-09-11

    preprintOpen accessSenior author

    Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we aim to bridge the gaps between theory and practice. First, we demonstrate that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory. More importantly, beyond the PAC learning framework (which only allows us to characterize how the model behaves when both training and test workloads are drawn from the same distribution), we establish, under mild assumptions, that selectivity predictors from this class exhibit favorable out-of-distribution (OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the in-distribution and OOD generalization capabilities of query-driven selectivity learning, and facilitate the design of two general strategies to improve OOD generalization for existing query-driven selectivity models. We empirically verify that our techniques help query-driven selectivity models generalize significantly better to OOD queries both in terms of prediction accuracy and query latency performance, while maintaining their superior in-distribution generalization performance.

  • Low Rank Approximation for Learned Query Optimization

    2024-05-17 · 1 citations

    article

    We present LimeQO, a learned steering query optimizer based on linear methods, such as matrix completion, for repetitive workloads. LimeQO can forgo expensive neural networks by taking advantage of the low-rank structure of query workloads. Using offline execution, LimeQO can accelerate workloads by up to 2x with zero regressions in just a few hours, while using 100-1000x fewer computational resources than deep learning techniques.

  • Modeling Shifting Workloads for Learned Database Systems

    Proceedings of the ACM on Management of Data · 2024-03-12 · 10 citations

    articleSenior author

    Learned database systems address several weaknesses of traditional cost estimation techniques in query optimization: they learn a model of a database instance, e.g., as queries are executed. However, when the database instance has skew and correlation, it is nontrivial to create an effective training set that anticipates workload shifts, where query structure changes and/or different regions of the data contribute to query answers. Our predictive model may perform poorly with these out-of-distribution inputs. In this paper, we study how the notion of a replay buffer can be managed through online algorithms to build a concise yet representative model of the workload distribution --- allowing for rapid adaptation and effective prediction of cardinalities and costs. We experimentally validate our methods over several data domains.

  • Implementation Strategies for Views over Property Graphs

    Proceedings of the ACM on Management of Data · 2024-05-29 · 8 citations

    articleSenior author

    The need to query complex interactions and relationships has motivated interest in property graph database platforms. For some graph applications, graph views are required to abstract the data, e.g., to capture individual-level vs. organization-level relationships; or show single computational steps vs. composite workflows. Emerging efforts to standardize graph query languages have developed semantics and language constructs for graph views. This paper considers the task of implementing such views using rewriting techniques --- both using existing property graph DBMSs and converting to relational RDBMSs. We consider both virtual and materialized views, ways of rewriting queries, and structures for indexing data. We also note a common use case of graph views, which involves preserving a graph except minor local transformations; we develop novel extensions and semantics for this. We evaluate and compare the performance of our techniques under a variety of workloads, and we compare existing graph and relational DBMS platforms.

Recent grants

Frequent coauthors

Education

  • PhD, Computer Science and Engineering

    University of Washington

    2002
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Zachary Ives

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup