Aaron Elmore

· Assistant Professor of Computer ScienceVerified

University of Chicago · Computer Science

Active 2010–2025

h-index25

Citations2.3k

Papers11338 last 5y

Funding$580k1 active

Faculty page Lab page

See your match with Aaron Elmore — sign in to PhdFit.Sign in

About

Aaron Elmore is an Associate Professor of Computer Science at the University of Chicago. The page does not provide specific details about his research focus, background, or key contributions.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Data Mining
Artificial Intelligence
Information Retrieval
Mathematics
Database
Social Science
Machine Learning
Statistics
Operating system
Algorithm
World Wide Web
Computational science
Biology
Distributed computing
Software engineering
Econometrics
Programming language
Materials science
Data science
Parallel computing

Selected publications

Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression
Proceedings of the VLDB Endowment · 2025-07-01 · 5 citations
articleSenior author
Modern data-intensive applications generate vast amounts of floating-point data, essential for fields like databases and machine learning. While many compression techniques focus on space efficiency, there is a lack of benchmarks evaluating both compression and query performance, especially in areas like in-situ query execution on compressed data and machine learning tasks such as distance measurement and k-nearest neighbors (k-NN) in Retrieval-Augmented Generation (RAG) systems. This paper addresses this gap by evaluating popular lossless floating-point compression methods on three key factors: compression efficiency, database operations performance, and machine learning query performance. We implemented these techniques in Rust and integrated them into an open-source library for use with columnar engines. Our comparison highlights trade-offs between compression efficiency and query performance, showing that no single approach excels in all areas, and some methods trade off compression for slower performance.
Publisher DOI
Not-So-Bitter Pill to Swallow: Slipstreaming Memory Safe Programming via Rust as part of a Database Systems Course
2025-06-22
articleOpen access
Publisher OA PDF DOI
Enhancing Transaction Processing through Indirection Skipping
Proceedings of the VLDB Endowment · 2025-07-01
article
In modern database management systems (DBMS), data retrieval typically requires traversing multiple layers—such as secondary indexes, primary indexes, and buffer pools—which introduces significant overhead and creates performance bottlenecks. In this paper, we propose a novel method that minimizes this overhead by establishing more direct access paths during data retrieval. Our experimental results demonstrate substantial efficiency gains across various DBMS components, including secondary indexing and concurrency control mechanisms. Specifically, we observe that implementing direct access paths can boost the throughput of transaction processing systems by up to 19.7× when executing the TPC-C-like benchmark with 40 threads. Furthermore, our approach holds promise for broader applications, potentially transforming data retrieval practices by enabling efficient handling of data movements with minimal overhead.
Publisher DOI
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection
ArXiv.org · 2025-02-18
preprintOpen access
Anomaly detection (AD) is a fundamental task for time-series analytics with important implications for the downstream performance of many applications. In contrast to other domains where AD mainly focuses on point-based anomalies (i.e., outliers in standalone observations), AD for time series is also concerned with range-based anomalies (i.e., outliers spanning multiple observations). Nevertheless, it is common to use traditional point-based information retrieval measures, such as Precision, Recall, and F-score, to assess the quality of methods by thresholding the anomaly score to mark each point as an anomaly or not. However, mapping discrete labels into continuous data introduces unavoidable shortcomings, complicating the evaluation of range-based anomalies. Notably, the choice of evaluation measure may significantly bias the experimental outcome. Despite over six decades of attention, there has never been a large-scale systematic quantitative and qualitative analysis of time-series AD evaluation measures. This paper extensively evaluates quality measures for time-series AD to assess their robustness under noise, misalignments, and different anomaly cardinality ratios. Our results indicate that measures producing quality values independently of a threshold (i.e., AUC-ROC and AUC-PR) are more suitable for time-series AD. Motivated by this observation, we first extend the AUC-based measures to account for range-based anomalies. Then, we introduce a new family of parameter-free and threshold-independent measures, Volume Under the Surface (VUS), to evaluate methods while varying parameters. We also introduce two optimized implementations for VUS that reduce significantly the execution time of the initial implementation. Our findings demonstrate that our four measures are significantly more robust in assessing the quality of time-series AD methods.
Publisher OA PDF DOI
VUS: effective and efficient accuracy measures for time-series anomaly detection
The VLDB Journal · 2025-03-27 · 18 citations
articleOpen access
Publisher OA PDF DOI
Riveter: Adaptive Query Suspension and Resumption Framework for Cloud Native Databases
2024-05-13
article
In modern cloud environments, ephemeral resources with intermittent availability and fluctuating monetary costs are becoming common. This dynamic nature presents a new challenge when deploying cloud-native databases: adaptive query execution, which can suspend queries when the resources are scarce or costs unexpectedly soar, and then resume them when the resources become available or cost-effective. Addressing this challenge requires the design and implementation of query suspension and resumption with a mechanism that can adaptively determine when, if, and how to suspend queries. In this paper, we propose Riveter, a query suspension and resumption framework that can adaptively pause ongoing queries using various strategies, including (1) a redo strategy that terminates queries and subsequently re-runs them, (2) a pipeline-level strategy that suspends a query once one of its pipelines has completed to reduce the storage requirements for intermediate data, (3) and a process-level strategy that enables the suspension of query execution processes at any given moment but generates a substantial volume of intermediate data for query resumption. We also devise a cost model to estimate query latency using various strategies and an algorithm to select the one that causes minimum latency. To demonstrate the effectiveness of Riveter, we conduct evaluations based on the TPC-H benchmark to investigate intermediate data persistence, strategy selection, and cost model-based estimation. Our results not only present the difference among the strategies of Riveter in terms of the size of persisted intermediate data and the time of triggering the suspension but also confirm the adaptive and efficient query suspension and resumption delivered by Riveter.
Publisher DOI
AdaEdge: A Dynamic Compression Selection Framework for Resource Constrained Devices
2024-05-13 · 5 citations
articleSenior author
With the Internet of Things (IoT), a vast number of connected devices generate significant data, necessitating efficient compression techniques to manage storage costs and enhance query performance. However, “one-size-fits-all” approach to data compression is ineffective due to diverse applications, which vary in data characteristics, workloads, and hardware limitations. This paper introduces AdaEdge, a dynamic, hardware-conscious compression selection framework tailored for resource-constrained devices. AdaEdge is a best-effort compression selection frame- work designed to preserve application-critical information as much as possible within system constraints. It enhances the use of limited system resources through a dynamic data compression policy that considers the staleness and the significance of the data. AdaEdge applies a multi-armed bandit algorithm to assist compression selection, optimizing workload targets such as compression ratio, compression throughput, workload accuracy, or their weighted combinations. It supports both lossy and lossless compression selection, adapting to hardware constraints. It operates in both online and offline modes, addressing network constraints for edge nodes and evolving data policies to preserve workload-specific information. AdaEdge improves machine learning task accuracy by up to 30% over baseline within the same storage budget and by up to 20% in scenarios where lossless methods fall short due to low compression ratios. AdaEdge also shows robustness against data shifts and hardware variability.
Publisher DOI
Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding Distances
Proceedings of the VLDB Endowment · 2023-04-01 · 23 citations
article
Similarity search is a core analytical task, and its performance critically depends on the choice of distance measure. For time-series querying, elastic measures achieve state-of-the-art accuracy but are computationally expensive. Thus, fast lower bounding (LB) measures prune unnecessary comparisons with elastic distances to accelerate similarity search. Despite decades of attention, there has never been a study to assess the progress in this area. In addition, the research has disproportionately focused on one popular elastic measure, while other accurate measures have received little or no attention. Therefore, there is merit in developing a framework to accumulate knowledge from previously developed LBs and eliminate the notoriously challenging task of designing separate LBs for each elastic measure. In this paper, we perform the first comprehensive study of 11 LBs spanning 5 elastic measures using 128 datasets. We identify four properties that constitute the effectiveness of LBs and propose the Generalized Lower Bounding (GLB) framework to satisfy all desirable properties. GLB creates cache-friendly summaries, adaptively exploits summaries of both query and target time series, and captures boundary distances in an unsupervised manner. GLB outperforms all LBs in speedup (e.g., up to 13.5× faster against the strongest LB in terms of pruning power), establishes new state-of-the-art results for the 5 elastic measures, and provides the first LBs for 2 elastic measures with no known LBs. Overall, GLB enables the effective development of LBs to facilitate fast similarity search.
Publisher DOI
Rotary: A Resource Arbitration Framework for Progressive Iterative Analytics
2023-04-01 · 1 citations
article
Increasingly modern computing applications employ progressive iterative analytics, as best exemplified by two prevalent cases, approximate query processing (AQP) and deep learning training (DLT). In comparison to classic computing applications that only return the results after processing all the input data, progressive iterative analytics keep providing approximate or partial results to users by performing computations on a subset of the entire dataset until either the users are satisfied with the results, or the predefined completion criteria are achieved. Typically, progressive iterative analytic jobs have various completion criteria, produce diminishing returns, and process data at different rates, which necessitates a novel resource arbitration that can continuously prioritize the progressive iterative analytic jobs and determine if/when to reallocate and preempt the resources. We propose and design a resource arbitration framework, Rotary, and implement two resource arbitration systems, Rotary-AQP and Rotary-DLT, for approximate query processing and deep learning training. We build a TPC-H based AQP workload and a survey-based DLT workload to evaluate the two systems, respectively. The evaluation results demonstrate that Rotary-AQP and Rotary-DLT outperform the state-of-the-art systems and confirm the generality and practicality of the proposed resource arbitration framework.
Publisher DOI
Data Station: Delegated, Trustworthy, and Auditable Computation to Enable Data-Sharing Consortia with a Data Escrow
arXiv (Cornell University) · 2023-05-05
preprintOpen access
Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively controlling what data to release is difficult, the few data-sharing consortia that exist are often built around data-sharing agreements resulting from long and tedious one-off negotiations. We introduce Data Station, a data escrow designed to enable the formation of data-sharing consortia. Data owners share data with the escrow knowing it will not be released without their consent. Data users delegate their computation to the escrow. The data escrow relies on delegated computation to execute queries without releasing the data first. Data Station leverages hardware enclaves to generate trust among participants, and exploits the centralization of data and computation to generate an audit log. We evaluate Data Station on machine learning and data-sharing applications while running on an untrusted intermediary. In addition to important qualitative advantages, we show that Data Station: i) outperforms federated learning baselines in accuracy and runtime for the machine learning application; ii) is orders of magnitude faster than alternative secure data-sharing frameworks; and iii) introduces small overhead on the critical path.
Publisher OA PDF DOI

Recent grants

CAREER: Intermittent Query Processing
NSF · $580k · 2021–2026

Frequent coauthors

Sanjay Krishnan
University of Chicago
38 shared
Michael J. Franklin
University of Chicago
37 shared
John Paparrizos
The Ohio State University
22 shared
Zechao Shang
Snowflake Inc. (United States)
21 shared
Samuel Madden
21 shared
Tim Mattson
18 shared
Michael Stonebraker
Massachusetts Institute of Technology
17 shared
Aditya Parameswaran
15 shared

Education

Ph.D., Computer Science
University of California, Santa Barbara
2015
M.S.
University of Chicago

Awards & honors

2021 CAREER Award
UChicago CS Faculty Receive Industry Grants From J.P. Morgan…
Aaron Elmore Promoted to Associate Professor at UChicago Com…
Six UChicago CS Faculty Receive CAREER Awards
Intel Research Awards Go To Two UChicagoCS Faculty for Video…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Aaron Elmore

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you