
Emery Berger
· ProfessorVerifiedUniversity of Massachusetts Amherst · Information Science and Technology
Active 1998–2026
About
Emery Berger is a professor affiliated with the University of Massachusetts, Amherst, specifically within the School of Computer Science. His research focuses on memory management and security, particularly in the context of heap-based attacks that exploit memory management errors and vulnerabilities in memory allocators. Berger's work includes a formal analysis of widely-deployed memory allocators used by major operating systems such as Windows, Linux, FreeBSD, and OpenBSD, demonstrating their continued vulnerability to such attacks. He has contributed to the design of DieHarder, a novel memory allocator that offers a high degree of security against heap-based attacks while maintaining modest performance overhead. This allocator has been shown to perform comparably to the Linux allocator in practical applications, such as the Firefox web browser. Berger's research integrates both theoretical analysis and practical implementation to enhance the security and reliability of memory management systems.
Research topics
- Artificial Intelligence
- Computer Science
- Programming language
- World Wide Web
- Operating system
Selected publications
Reconsidering "Reconsidering Custom Memory Allocation"
arXiv (Cornell University) · 2026-05-16
preprintOpen accessSenior authorProgrammers using native languages such as C, C++, or Rust can implement custom memory allocation strategies to improve execution time. In their paper titled "Reconsidering Custom Memory Allocation" almost 25 years ago, Berger et al. showed that while per-class allocators provide no significant speedups over a state-of-the-art general-purpose allocator, region-based allocators can improve execution time by allocating and freeing objects in bulk. This paper revisits that work on a modern hardware platform with modern general-purpose allocators to evaluate whether their conclusions still hold. It also augments the benchmark suite with two large real-world applications (Clang and Blender), and introduces a methodology to explore the effect of memory fragmentation on locality in general-purpose allocators. Our results support and extend the original conclusions, demonstrating the locality advantages of region-based custom memory allocators.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-23
otherOpen access1st authorCorrespondingUI, Frontend & Visualization Convert scalene-gui JavaScript to TypeScript by @emeryberger in https://github.com/plasma-umass/scalene/pull/970 Add Google Gemini provider, environment variable support, and UI modernization by @emeryberger in https://github.com/plasma-umass/scalene/pull/973 Vendor assets locally for offline HTML viewer support by @emeryberger in https://github.com/plasma-umass/scalene/pull/983 Add per-file display mode dropdown for profile filtering (fixes #813) by @emeryberger in https://github.com/plasma-umass/scalene/pull/989 LLM / API Provider Support Add support for OpenAI-compatible API servers and Anthropic (fixes #918) by @emeryberger in https://github.com/plasma-umass/scalene/pull/971 Add Google Gemini provider, environment variable support, and UI modernization by @emeryberger in https://github.com/plasma-umass/scalene/pull/973 CPU & Core Profiling Engine Optimize CPU profiling instrumentation by @emeryberger in https://github.com/plasma-umass/scalene/pull/988 Fix crash when frame.f_lineno is None in Python 3.11+ by @emeryberger in https://github.com/plasma-umass/scalene/pull/976 Multiprocessing, Exec, and Runtime Compatibility Fix multiprocessing spawn mode support (#873) by @emeryberger in https://github.com/plasma-umass/scalene/pull/984 Fix multiprocessing spawn mode sys.argv handling (#846) by @emeryberger in https://github.com/plasma-umass/scalene/pull/986 Add profiling support for exec'd code (fixes #824) by @emeryberger in https://github.com/plasma-umass/scalene/pull/987 Fix signal conflict crash with PyTorch Lightning and similar libraries by @emeryberger in https://github.com/plasma-umass/scalene/pull/977 PyTorch & JIT Integration Add PyTorch JIT profiling support (fixes #908) by @emeryberger in https://github.com/plasma-umass/scalene/pull/972 GPU & Apple Silicon Add per-process MPS GPU profiling for Apple Silicon by @emeryberger in https://github.com/plasma-umass/scalene/pull/974 Windows Support & Reliability Improve Windows memory profiling error messages and documentation by @emeryberger in https://github.com/plasma-umass/scalene/pull/978 Fix Windows CPU profiling not collecting samples by @emeryberger in https://github.com/plasma-umass/scalene/pull/980 Notebook & Editor Integration Fix Jupyter notebook display in VSCode (fixes #951) by @emeryberger in https://github.com/plasma-umass/scalene/pull/969
On the Common Pitfalls of Designing and Communicating Within-Subjects Experiments in HCI
2026-04-13
articleOpen accessWell-designed experiments are essential for drawing valid statistical conclusions. In studies with limited resources (e.g., access to human participants), researchers often assign multiple conditions to the same participant (i.e., within-subjects experiments). Although such designs can increase statistical power, dependencies across trials within a participant may threaten the validity of the experiment. Unfortunately, domain-specific assumptions about these dependencies are often left implicit when conducting, analyzing, and communicating results from within-subjects experiments. We show that some common within-subjects experiments in the HCI community make assumptions that may not actually hold and provide a formal representation for precisely encoding these assumptions. We hope these results and examples will motivate changes to how we as a community reason about, design, and communicate experiments.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-30
otherOpen access1st authorCorrespondingWhat's Changed Bug Fixes Fix Windows CPU profiling extreme slowness and memory explosion (#992): The v2.1 Windows timer loop hardcoded a 1ms sampling interval regardless of the configured rate (default 10ms), generating ~10x more samples than intended and reducing sys.setswitchinterval to 1ms causing excessive GIL contention. Now uses the actual configured sampling rate. Improvements Bound memory footprint samples with reservoir sampling (#993): Replace unbounded list accumulation of memory footprint samples with sorted_reservoir (Vitter reservoir sampling), capping memory at O(k) instead of O(n) where n is the number of malloc/free events. Eliminates unbounded memory growth during long profiling runs. Full Changelog: https://github.com/plasma-umass/scalene/compare/v2.1.1...v2.1.2
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-31
otherOpen access1st authorCorrespondingWhat's Changed Fix loop backward-jump sampling bias with even redistribution by @emeryberger in https://github.com/plasma-umass/scalene/pull/995 Fix Google Analytics loading delay in offline/standalone mode by @Copilot in https://github.com/plasma-umass/scalene/pull/996 Fixed Jupyter magics. by @emeryberger in https://github.com/plasma-umass/scalene/pull/997 Full Changelog: https://github.com/plasma-umass/scalene/compare/v2.1.2...v2.1.3
FlowBook: Enforcing Reproducibility in Computational Notebooks
arXiv (Cornell University) · 2026-05-02
preprintOpen accessComputational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-30
otherOpen access1st authorCorrespondingWhat's Changed Add library profiler infrastructure for JAX and TensorFlow by @emeryberger in https://github.com/plasma-umass/scalene/pull/990 Removes stray debug statements on Windows. Full Changelog: https://github.com/plasma-umass/scalene/compare/v2.1.0...v2.1.1
Reconsidering "Reconsidering Custom Memory Allocation"
ArXiv.org · 2026-05-16
articleOpen accessSenior authorProgrammers using native languages such as C, C++, or Rust can implement custom memory allocation strategies to improve execution time. In their paper titled "Reconsidering Custom Memory Allocation" almost 25 years ago, Berger et al. showed that while per-class allocators provide no significant speedups over a state-of-the-art general-purpose allocator, region-based allocators can improve execution time by allocating and freeing objects in bulk. This paper revisits that work on a modern hardware platform with modern general-purpose allocators to evaluate whether their conclusions still hold. It also augments the benchmark suite with two large real-world applications (Clang and Blender), and introduces a methodology to explore the effect of memory fragmentation on locality in general-purpose allocators. Our results support and extend the original conclusions, demonstrating the locality advantages of region-based custom memory allocators.
FlowBook: Enforcing Reproducibility in Computational Notebooks
ArXiv.org · 2026-05-02
articleOpen accessComputational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).
It’s Not Easy Being Green: On the Energy Efficiency of Programming Languages
2025-11-16
articleSenior authorDoes the choice of programming language affect energy consumption? Previous highly visible studies have established associations between certain programming languages and energy consumption. A causal misinterpretation of this work has led academics and industry leaders to use or support certain languages based on their claimed impact on energy consumption. This paper tackles this causal question directly: it develops a detailed causal model capturing the complex relationship between programming language choice and energy consumption. This model identifies and incorporates several critical but previously overlooked factors that affect energy usage. These factors, such as distinguishing programming languages from their implementations, the impact of the application implementations themselves, the number of active cores, and memory activity, can significantly skew energy consumption measurements if not accounted for. We show—via empirical experiments, improved methodology, and careful examination of anomalies—that when these factors are controlled for, notable discrepancies in prior work vanish. Our analysis suggests that the choice of programming language implementation has no significant impact on energy consumption beyond execution time.
Recent grants
SHF: Large: Collaborative Research: Reliable Performance for Modern Systems
NSF · $346k · 2010–2014
NSF · $180k · 2012–2015
SHF: Large:Collaborative Research: PASS: Perpetually Available Software Systems
NSF · $639k · 2009–2014
CAREER: Cooperative System Support for Robust High Performance
NSF · $471k · 2004–2010
NSF · $250k · 2015–2018
Frequent coauthors
- 21 shared
Benjamin G. Zorn
- 16 shared
Charlie Curtsinger
Grinnell College
- 14 shared
Kathryn S. McKinley
Google (United States)
- 13 shared
Eduardo Quiñones
Universitat Politècnica de Catalunya
- 13 shared
Gene Novark
University of Massachusetts Amherst
- 12 shared
Tongping Liu
University of Massachusetts Amherst
- 11 shared
John Vilk
University of Massachusetts Amherst
- 11 shared
Leonidas Kosmidis
Barcelona Supercomputing Center
Labs
Securing the Heap
Education
- 2002
PhD, Department of Computer Sciences
University of Texas at Austin
Awards & honors
- Microsoft Research Fellowship
- NSF CAREER Award
- Lilly Teaching Fellowship
- Most Influential Paper Awards from ASPLOS, OOPSLA, and PLDI
- Five papers selected as CACM Research Highlights
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Emery Berger
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup