
Samuel Madden
Massachusetts Institute of Technology · Electrical Engineering & Computer Science
Active 1972–2024
Research topics
- Computer Science
- Artificial Intelligence
- Database
- Information Retrieval
- Operating system
- Parallel computing
- Mathematics
- Theoretical computer science
- Computer vision
- Computer network
- Distributed computing
Selected publications
A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics
2020 · 95 citations
- Computer Science
- Computer Science
- Parallel computing
There has been significant amount of excitement and recent work on GPU-based database systems. Previous work has claimed that these systems can perform orders of magnitude better than CPU-based database systems on analytical workloads such as those found in decision support and business intelligence applications. A hardware expert would view these claims with suspicion. Given the general notion that database operators are memory-bandwidth bound, one would expect the maximum gain to be roughly equal to the ratio of the memory bandwidth of GPU to that of CPU. In this paper, we adopt a model-based approach to understand when and why the performance gains of running queries on GPUs vs on CPUs vary from the bandwidth ratio (which is roughly 16× on modern hardware). We propose Crystal, a library of parallel routines that can be combined together to run full SQL queries on a GPU with minimal materialization overhead. We implement individual query operators to show that while the speedups for selection, projection, and sorts are near the bandwidth ratio, joins achieve less speedup due to differences in hardware capabilities. Interestingly, we show on a popular analytical workload that full query performance gain from running on GPU exceeds the bandwidth ratio despite individual operators having speedup less than bandwidth ratio, as a result of limitations of vectorizing chained operators on CPUs, resulting in a 25× speedup for GPUs over CPUs on the benchmark.
MIRIS: Fast Object Track Queries in Video
2020 · 68 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Artificial Intelligence
Video databases that enable queries with object-track predicates are useful in many applications. Such queries include selecting objects that move from one region of the camera frame to another (e.g., finding cars that turn right through a junction) and selecting objects with certain speeds (e.g., finding animals that stop to drink water from a lake). Processing such predicates efficiently is challenging because they involve the movement of an object over several video frames. We propose a novel query-driven tracking approach that integrates query processing with object tracking to efficiently process object track queries and address the computational complexity of object detection methods. By processing video at low framerates when possible, but increasing the framerate when needed to ensure high-accuracy on a query, our approach substantially speeds up query execution. We have implemented query-driven tracking in MIRIS, a video query processor, and compare MIRIS against four baselines on a diverse dataset consisting of five sources of video and nine distinct queries. We find that, at the same accuracy, MIRIS accelerates video query processing by 9x on average over the IOU tracker, an overlap-based tracking-by-detection method used in existing video database systems.
Starling: A Scalable Query Engine on Cloud Functions
2020 · 66 citations
Senior authorCorresponding- Computer Science
- Computer Science
- Database
Much like on-premises systems, the natural choice for running database analytics workloads in the cloud is to provision a cluster of nodes to run a database instance. However, analytics workloads are often bursty or low volume, leaving clusters idle much of the time, meaning customers pay for compute resources even when underutilized. The ability of cloud function services, such as AWS Lambda or Azure Functions, to run small, fine granularity tasks make them appear to be a natural choice for query processing in such settings. But implementing an analytics system on cloud functions comes with its own set of challenges. These include managing hundreds of tiny stateless resource-constrained workers, handling stragglers, and shuffling data through opaque cloud services. In this paper we present Starling, a query execution engine built on cloud function services that employs a number of techniques to mitigate these challenges, providing interactive query latency at a lower total cost than provisioned systems with low-to-moderate utilization. In particular, on a 1TB TPC-H dataset in cloud storage, Starling is less expensive than the best provisioned systems for workloads when queries arrive 1 minute apart or more. Starling also has lower latency than competing systems reading from cloud object stores and can scale to larger datasets.
Sat2Graph: Road Graph Extraction Through Graph-Tensor Encoding
Lecture notes in computer science · 2020 · 102 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Recent grants
III: Medium: Massively Parallel Data Analytics on Heterogeneous Architectures
NSF · $1.2M · 2018–2023
CSR-CSI: XStream, a Distributed Stream Processor for Heterogeneous Sensor Systems
NSF · $350k · 2007–2010
NSF · $200k · 2005–2009
BD Spokes: SPOKE: NORTHEAST: Collaborative: A Licensing Model and Ecosystem for Data Sharing
NSF · $816k · 2016–2021
Collaborative Research: IDBR: VoxNet- A Deployable Bioacoustic Sensor Network
NSF · $135k · 2008–2012
Frequent coauthors
- 68 shared
Michael Stonebraker
Massachusetts Institute of Technology
- 59 shared
Joseph M. Hellerstein
University of California, Berkeley
- 47 shared
Hari Balakrishnan
IIT@MIT
- 35 shared
Michael J. Franklin
University of Chicago
- 34 shared
Lei Cao
Harbin Medical University
- 32 shared
Tim Kraska
Amazon (United States)
- 30 shared
Favyen Bastani
- 28 shared
Manasi Vartak
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Samuel Madden
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup