
Aaron Elmore
· Assistant Professor of Computer ScienceVerifiedUniversity of Chicago · Computer Science
Active 2010–2025
About
Aaron Elmore is an Associate Professor of Computer Science at the University of Chicago. The page does not provide specific details about his research focus, background, or key contributions.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Data Mining
- Artificial Intelligence
- Information Retrieval
- Mathematics
- Database
- Social Science
- Machine Learning
- Statistics
- Operating system
- Algorithm
- World Wide Web
- Computational science
- Biology
- Distributed computing
- Software engineering
- Econometrics
- Programming language
- Materials science
- Data science
- Parallel computing
Selected publications
Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression
Proceedings of the VLDB Endowment · 2025-07-01 · 5 citations
articleSenior authorModern data-intensive applications generate vast amounts of floating-point data, essential for fields like databases and machine learning. While many compression techniques focus on space efficiency, there is a lack of benchmarks evaluating both compression and query performance, especially in areas like in-situ query execution on compressed data and machine learning tasks such as distance measurement and k-nearest neighbors (k-NN) in Retrieval-Augmented Generation (RAG) systems. This paper addresses this gap by evaluating popular lossless floating-point compression methods on three key factors: compression efficiency, database operations performance, and machine learning query performance. We implemented these techniques in Rust and integrated them into an open-source library for use with columnar engines. Our comparison highlights trade-offs between compression efficiency and query performance, showing that no single approach excels in all areas, and some methods trade off compression for slower performance.
2025-06-22
articleOpen accessEnhancing Transaction Processing through Indirection Skipping
Proceedings of the VLDB Endowment · 2025-07-01
articleIn modern database management systems (DBMS), data retrieval typically requires traversing multiple layers—such as secondary indexes, primary indexes, and buffer pools—which introduces significant overhead and creates performance bottlenecks. In this paper, we propose a novel method that minimizes this overhead by establishing more direct access paths during data retrieval. Our experimental results demonstrate substantial efficiency gains across various DBMS components, including secondary indexing and concurrency control mechanisms. Specifically, we observe that implementing direct access paths can boost the throughput of transaction processing systems by up to 19.7× when executing the TPC-C-like benchmark with 40 threads. Furthermore, our approach holds promise for broader applications, potentially transforming data retrieval practices by enabling efficient handling of data movements with minimal overhead.
VUS: Effective and Efficient Accuracy Measures for Time-Series Anomaly Detection
ArXiv.org · 2025-02-18
preprintOpen accessAnomaly detection (AD) is a fundamental task for time-series analytics with important implications for the downstream performance of many applications. In contrast to other domains where AD mainly focuses on point-based anomalies (i.e., outliers in standalone observations), AD for time series is also concerned with range-based anomalies (i.e., outliers spanning multiple observations). Nevertheless, it is common to use traditional point-based information retrieval measures, such as Precision, Recall, and F-score, to assess the quality of methods by thresholding the anomaly score to mark each point as an anomaly or not. However, mapping discrete labels into continuous data introduces unavoidable shortcomings, complicating the evaluation of range-based anomalies. Notably, the choice of evaluation measure may significantly bias the experimental outcome. Despite over six decades of attention, there has never been a large-scale systematic quantitative and qualitative analysis of time-series AD evaluation measures. This paper extensively evaluates quality measures for time-series AD to assess their robustness under noise, misalignments, and different anomaly cardinality ratios. Our results indicate that measures producing quality values independently of a threshold (i.e., AUC-ROC and AUC-PR) are more suitable for time-series AD. Motivated by this observation, we first extend the AUC-based measures to account for range-based anomalies. Then, we introduce a new family of parameter-free and threshold-independent measures, Volume Under the Surface (VUS), to evaluate methods while varying parameters. We also introduce two optimized implementations for VUS that reduce significantly the execution time of the initial implementation. Our findings demonstrate that our four measures are significantly more robust in assessing the quality of time-series AD methods.
VUS: effective and efficient accuracy measures for time-series anomaly detection
The VLDB Journal · 2025-03-27 · 18 citations
articleOpen accessRiveter: Adaptive Query Suspension and Resumption Framework for Cloud Native Databases
2024-05-13
articleIn modern cloud environments, ephemeral resources with intermittent availability and fluctuating monetary costs are becoming common. This dynamic nature presents a new challenge when deploying cloud-native databases: adaptive query execution, which can suspend queries when the resources are scarce or costs unexpectedly soar, and then resume them when the resources become available or cost-effective. Addressing this challenge requires the design and implementation of query suspension and resumption with a mechanism that can adaptively determine when, if, and how to suspend queries. In this paper, we propose Riveter, a query suspension and resumption framework that can adaptively pause ongoing queries using various strategies, including (1) a redo strategy that terminates queries and subsequently re-runs them, (2) a pipeline-level strategy that suspends a query once one of its pipelines has completed to reduce the storage requirements for intermediate data, (3) and a process-level strategy that enables the suspension of query execution processes at any given moment but generates a substantial volume of intermediate data for query resumption. We also devise a cost model to estimate query latency using various strategies and an algorithm to select the one that causes minimum latency. To demonstrate the effectiveness of Riveter, we conduct evaluations based on the TPC-H benchmark to investigate intermediate data persistence, strategy selection, and cost model-based estimation. Our results not only present the difference among the strategies of Riveter in terms of the size of persisted intermediate data and the time of triggering the suspension but also confirm the adaptive and efficient query suspension and resumption delivered by Riveter.
AdaEdge: A Dynamic Compression Selection Framework for Resource Constrained Devices
2024-05-13 · 5 citations
articleSenior authorWith the Internet of Things (IoT), a vast number of connected devices generate significant data, necessitating efficient compression techniques to manage storage costs and enhance query performance. However, “one-size-fits-all” approach to data compression is ineffective due to diverse applications, which vary in data characteristics, workloads, and hardware limitations. This paper introduces AdaEdge, a dynamic, hardware-conscious compression selection framework tailored for resource-constrained devices. AdaEdge is a best-effort compression selection frame- work designed to preserve application-critical information as much as possible within system constraints. It enhances the use of limited system resources through a dynamic data compression policy that considers the staleness and the significance of the data. AdaEdge applies a multi-armed bandit algorithm to assist compression selection, optimizing workload targets such as compression ratio, compression throughput, workload accuracy, or their weighted combinations. It supports both lossy and lossless compression selection, adapting to hardware constraints. It operates in both online and offline modes, addressing network constraints for edge nodes and evolving data policies to preserve workload-specific information. AdaEdge improves machine learning task accuracy by up to 30% over baseline within the same storage budget and by up to 20% in scenarios where lossless methods fall short due to low compression ratios. AdaEdge also shows robustness against data shifts and hardware variability.
Proceedings of the VLDB Endowment · 2023-04-01 · 23 citations
articleSimilarity search is a core analytical task, and its performance critically depends on the choice of distance measure. For time-series querying, elastic measures achieve state-of-the-art accuracy but are computationally expensive. Thus, fast lower bounding (LB) measures prune unnecessary comparisons with elastic distances to accelerate similarity search. Despite decades of attention, there has never been a study to assess the progress in this area. In addition, the research has disproportionately focused on one popular elastic measure, while other accurate measures have received little or no attention. Therefore, there is merit in developing a framework to accumulate knowledge from previously developed LBs and eliminate the notoriously challenging task of designing separate LBs for each elastic measure. In this paper, we perform the first comprehensive study of 11 LBs spanning 5 elastic measures using 128 datasets. We identify four properties that constitute the effectiveness of LBs and propose the Generalized Lower Bounding (GLB) framework to satisfy all desirable properties. GLB creates cache-friendly summaries, adaptively exploits summaries of both query and target time series, and captures boundary distances in an unsupervised manner. GLB outperforms all LBs in speedup (e.g., up to 13.5× faster against the strongest LB in terms of pruning power), establishes new state-of-the-art results for the 5 elastic measures, and provides the first LBs for 2 elastic measures with no known LBs. Overall, GLB enables the effective development of LBs to facilitate fast similarity search.
Rotary: A Resource Arbitration Framework for Progressive Iterative Analytics
2023-04-01 · 1 citations
articleIncreasingly modern computing applications employ progressive iterative analytics, as best exemplified by two prevalent cases, approximate query processing (AQP) and deep learning training (DLT). In comparison to classic computing applications that only return the results after processing all the input data, progressive iterative analytics keep providing approximate or partial results to users by performing computations on a subset of the entire dataset until either the users are satisfied with the results, or the predefined completion criteria are achieved. Typically, progressive iterative analytic jobs have various completion criteria, produce diminishing returns, and process data at different rates, which necessitates a novel resource arbitration that can continuously prioritize the progressive iterative analytic jobs and determine if/when to reallocate and preempt the resources. We propose and design a resource arbitration framework, Rotary, and implement two resource arbitration systems, Rotary-AQP and Rotary-DLT, for approximate query processing and deep learning training. We build a TPC-H based AQP workload and a survey-based DLT workload to evaluate the two systems, respectively. The evaluation results demonstrate that Rotary-AQP and Rotary-DLT outperform the state-of-the-art systems and confirm the generality and practicality of the proposed resource arbitration framework.
arXiv (Cornell University) · 2023-05-05
preprintOpen accessPooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively controlling what data to release is difficult, the few data-sharing consortia that exist are often built around data-sharing agreements resulting from long and tedious one-off negotiations. We introduce Data Station, a data escrow designed to enable the formation of data-sharing consortia. Data owners share data with the escrow knowing it will not be released without their consent. Data users delegate their computation to the escrow. The data escrow relies on delegated computation to execute queries without releasing the data first. Data Station leverages hardware enclaves to generate trust among participants, and exploits the centralization of data and computation to generate an audit log. We evaluate Data Station on machine learning and data-sharing applications while running on an untrusted intermediary. In addition to important qualitative advantages, we show that Data Station: i) outperforms federated learning baselines in accuracy and runtime for the machine learning application; ii) is orders of magnitude faster than alternative secure data-sharing frameworks; and iii) introduces small overhead on the critical path.
Recent grants
CAREER: Intermittent Query Processing
NSF · $580k · 2021–2026
Frequent coauthors
- 38 shared
Sanjay Krishnan
University of Chicago
- 37 shared
Michael J. Franklin
University of Chicago
- 22 shared
John Paparrizos
The Ohio State University
- 21 shared
Zechao Shang
Snowflake Inc. (United States)
- 21 shared
Samuel Madden
- 18 shared
Tim Mattson
- 17 shared
Michael Stonebraker
Massachusetts Institute of Technology
- 15 shared
Aditya Parameswaran
Education
- 2015
Ph.D., Computer Science
University of California, Santa Barbara
M.S.
University of Chicago
Awards & honors
- 2021 CAREER Award
- UChicago CS Faculty Receive Industry Grants From J.P. Morgan…
- Aaron Elmore Promoted to Associate Professor at UChicago Com…
- Six UChicago CS Faculty Receive CAREER Awards
- Intel Research Awards Go To Two UChicagoCS Faculty for Video…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Aaron Elmore
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup