Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Andreas Haeberlen

Andreas Haeberlen

· Associate ProfessorVerified

University of Pennsylvania · Computer and Information Science

Active 2000–2026

h-index36
Citations5.8k
Papers11311 last 5y
Funding$2.6M1 active
See your match with Andreas Haeberlen — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • Distributed computing
  • Computer network
  • Data Mining
  • Theoretical computer science
  • Database
  • World Wide Web
  • Algorithm
  • Programming language
  • Operating system
  • Embedded system
  • Parallel computing
  • Mathematics

Selected publications

  • Running Distributed Systems like Clockwork

    Leibniz-Zentrum für Informatik (Schloss Dagstuhl) · 2026-01-01

    articleOpen access

    Distributed Systems are commonly built using a set of standard assumptions: we assume that message delays are unbounded, that any packet can be lost in the network, and that clocks cannot be closely synchronized. On the one hand, these conservative assumptions result in robust systems that can operate reliably in a wide variety of conditions. On the other hand, they also force the system to do a lot of complex ad-hoc coordination and thus limit the performance it can achieve. In this paper, we take a look at what lies beyond this standard model. We observe that, on modern hardware in a single-tenant data center, distributed systems are able to closely coordinate and essentially "run like clockwork" with very little effort. If we are willing to additionally rule out some worst-case failure scenarios, this results in a large performance improvement, both in practice and even in theory. We demonstrate this effect using state-machine replication (SMR) as a case study: our SMR protocol, Watchmaker, exceeds the throughput of state-of-the-art algorithms by two orders of magnitude, and it requires only half as many replicas to tolerate the same number of faults.

  • RoboRebound: Multi-Robot System Defense with Bounded-Time Interaction

    2025-03-26 · 1 citations

    article

    Byzantine Fault Tolerance (BFT) is a classic technique for defending distributed systems against a wide range of faults and attacks. However, existing solutions are designed for systems where nodes can interact only by exchanging messages. They are not directly applicable to systems where nodes have sensors and actuators and can also interact in the physical world - perhaps by blocking each other's path or by crashing into each other.

  • Modeling Metastability

    2025-11-17

    article

    Recently, there has been increasing concern about a new failure mode in data-center systems: when there is an external shock, such as a sudden load spike or some machine failures, systems will sometimes respond with reduced throughput - but, in contrast to a traditional overload situation, the throughput does not recover once the external shock disappears, and remains permanently degraded. This phenomenon has been called a metastable failure.

  • Metaverse as a Service

    2023-10-30 · 5 citations

    articleOpen access1st authorCorresponding

    We present a vision for the future of an emerging category of cloud service: the metaverse of 3D virtual worlds. Today, hundreds of millions of users are active daily in such worlds, but they are partitioned into small groups of at most a few hundred players. Each group joins a different virtual world instance, and players can only interact in 3D with others players in the same group during that session. Current platforms are designed in ways that simply cannot scale much further, and solutions from other cloud services do not generalize to the more interactive, bidirectional, and latency-sensitive interactive 3D domain. We outline some of the technical challenges that currently stand in the way of a metaverse without inherent technical limitations on the number of users in a shared experience. We argue that, although these obviously touch on many other areas of Computer Science such as computer graphics and numerical simulation, the core challenges lie squarely within the systems domain.

  • Arboretum: A Planner for Large-Scale Federated Analytics with Differential Privacy

    2023-10-03 · 3 citations

    articleSenior author

    Federated analytics is a way to answer queries over sensitive data that is spread across multiple parties, without sharing the data or collecting it in a single place. Prior work has developed solutions that can scale to large deployments with millions of devices but, due to the distributed nature of federated analytics, these solutions can support only a limited class of queries - typically various forms of numerical queries, which can be answered with lightweight cryptographic primitives. Supporting richer queries, such as categorical queries, requires heavier cryptography, whose cost can quickly exceed even the resources of a powerful data center.

  • Mycelium

    2021 · 19 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Theoretical computer science

    This paper introduces Mycelium, the first system to process differentially private queries over large graphs that are distributed across millions of user devices. Such graphs occur, for instance, when tracking the spread of diseases or malware. Today, the only practical way to query such graphs is to upload them to a central aggregator, which requires a great deal of trust from users and rules out certain types of studies entirely. With Mycelium, users' private data never leaves their personal devices unencrypted, and each user receives strong privacy guarantees. Mycelium does require the help of a central aggregator with access to a data center, but the aggregator merely facilitates the computation by providing bandwidth and computation power; it never learns the topology of the graph or the underlying data. Mycelium accomplishes this with a combination of homomorphic encryption, a verifiable secret redistribution scheme, and a mix network based on telescoping circuits. Our evaluation shows that Mycelium can answer a range of different questions from the medical literature with millions of devices.

  • REBOUND

    2021-04-21 · 4 citations

    articleOpen access

    This paper shows how to use bounded-time recovery (BTR) to defend distributed systems against non-crash faults and attacks. Unlike many existing fault-tolerance techniques, BTR does not attempt to completely mask all symptoms of a fault; instead, it ensures that the system returns to the correct behavior within a bounded amount of time. This weaker guarantee is sufficient, e.g., for many cyber-physical systems, where physical properties -such as inertia and thermal capacityprevent quick state changes and thus limit the damage that can result from a brief period of undefined behavior.

  • Do Not Overpay for Fault Tolerance!

    2021-05-01 · 6 citations

    articleSenior author

    In this paper, we argue that distributed real-time and embedded systems sometimes “overpay” for fault tolerance, by using a protocol that is more powerful than what is actually needed, or by failing to take advantage of unique features in these systems. As a result, these systems sometimes perform more computation or communication than is strictly necessary, or they can be unnecessarily complex, and thus more difficult to analyze. We take a look at the design space for two common problems, broadcast and consensus, and we show that, in a number of scenarios that would be common in real-time systems, these problems have trivial solutions. We then examine two solutions from the literature and propose alternatives that are substantially simpler, less expensive, and more reliable.

  • DNA: Dynamic Resource Allocation for Soft Real-Time Multicore Systems

    2021 · 15 citations

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Distributed computing

    Modern latency-sensitive and real-time systems often use multi-core platforms; thus, tasks on different cores share certain hardware resources, such as the memory bus and certain cache levels. This has two undesirable consequences: (1) tasks can interfere With each other, causing high latency for the system as a whole, and (2) it becomes difficult to meet deadlines, since the worst-case timing of a given task depends on all the tasks it might have to compete with. Static partitioning isolates tasks from each other by allocating a certain fraction of the resources to each; however, many tasks execute in different phases (e.g., memory-intensive and CPU-intensive) that have different requirements. Thus, system designers are left with a choice between overprovisioning, based on the most demanding phase, or suboptimal performance.In this paper, we propose a pair of techniques, called DNA and DADNA, to address the above challenge. DNA increases throughput and decreases latency, by building an execution profile of each task to identify the phases, and then dynamically allocating resources based on which task can benefit the most; DADNA further adds support for soft real-time workloads by taking deadlines into account. We have built a prototype of both techniques in the Xen hypervisor; our experimental results show that, compared to a state-of-the-art solution, DNA and DADNA can substantially improve schedulability, reduce job deadline miss ratios, and cut latencies by more than a factor of two even in extremely overloaded situations.

  • Bounded-time recovery for distributed real-time systems

    2020-04-01 · 8 citations

    articleSenior author

    This paper explores bounded-time recovery (BTR), a new approach to making cyber-physical systems robust to crash faults. Rather than trying to mask the symptoms of a fault with massive redundancy, BTR detects faults at runtime and enables the system to recover from them – e.g., by transferring tasks to other nodes that are still working correctly. When a fault does occur, there is a brief period of instability during which the system can produce incorrect outputs. However, many cyber-physical systems have physical properties – such as inertia or thermal capacity – that limit the rate at which the state of the system can change; thus, a very brief outage is often acceptable, as long as its duration can be bounded, to perhaps a few milliseconds.BTR has some interesting properties: for instance, it has a much lower overhead than Paxos, and, unlike Paxos, it can take useful actions even when the system partitions or a majority of the nodes fails. However, it also poses a very unusual scheduling problem that involves creating sets of interrelated schedules for different failure modes. We present a scheduling algorithm called Cascade that can quickly find suitable schedules. Using a prototype implementation, we show that Cascade scales far better than a baseline algorithm and reduces the scheduling time from hours to a few seconds, without sacrificing quality.

Recent grants

Frequent coauthors

  • Boon Thau Loo

    30 shared
  • Peter Druschel

    30 shared
  • Wenchao Zhou

    Alibaba Group (China)

    25 shared
  • Alan Mislove

    Northeastern University

    16 shared
  • Krishna P. Gummadi

    15 shared
  • Micah Sherr

    Georgetown University

    14 shared
  • Ang Chen

    Jiangsu University

    14 shared
  • Marcel Dischinger

    Max Planck Institute for Software Systems

    14 shared

Labs

  • Penn Engineering's TeamPI

Education

  • PhD, Computer Science

    Rice University

    2009
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Andreas Haeberlen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup