
Maria Apostolaki
· Associated FacultyVerifiedPrinceton University · Computer Science
Active 1998–2026
About
Maria Apostolaki is an Assistant Professor of Electrical and Computer Engineering at Princeton University. Her research focuses on designing and building networked systems that are secure, reliable, and performant. Her work draws from multiple disciplines including networking, security, blockchain, and machine learning, addressing the complex interdependent components and layers of real-world systems. She has contributed to various areas such as internet path selection, routing attacks on cryptocurrencies, and data plane connectivity recovery. Her efforts have been recognized with awards such as the N2Women Rising Stars in Networking and Communications in 2021 and the IRTF/IETF Applied Networking Research Prize in 2018.
Research topics
- Environmental science
- Chemistry
- Environmental chemistry
- Computer Science
- Environmental health
- Telecommunications
- Chromatography
- Computer network
- Real-time computing
- Ecology
- Engineering
- Geography
- Atmospheric sciences
- Meteorology
Selected publications
Passive Data-Plane Telemetry to Mitigate Long-Distance BGP Hijacks
ArXiv.org · 2026-01-01
articleOpen accessPoor security of Internet routing enables adversaries to divert user data through unintended infrastructures in attacks known as hijacks. Of particular concern - and the focus of this paper - are cases where attackers reroute domestic traffic through foreign countries and still deliver it to the intended destination, exposing traffic to surveillance, bypassing legal privacy protections, and posing national security threats. Efforts to detect and mitigate such attacks have focused primarily on the control plane, while data-plane signals remain largely overlooked. In this paper, we argue that passively-monitored round-trip time (RTT) - and, in particular, changes in its propagation-delay component - offers a promising signal for detection: the increased propagation delay is unavoidable and directly observable from affected networks, enabling opportunities for faster detection and mitigation. We explore the practicality of using RTT variations for hijack detection, addressing two key questions: (1) What coverage can this provide, given its heavy dependence on the geolocations of the sender, receiver, and adversary? and (2) Can an always-on RTT-based detection system be deployed without disrupting normal network operations? Focusing on cross-country interception attacks, we find that coverage is high: 97% under ideal (i.e., data travels at the speed of light) conditions, and 91% and 86% with real traffic from two datasets. To demonstrate practicality, we design HiDe, which reliably detects delay surges from long-distance hijacks at line rate using commodity programmable hardware. We measure HiDe’s accuracy and false-positive rate on real-world data and validate it with ethically conducted hijacks.
ArXiv.org · 2026-02-11
articleOpen accessSenior authorAll-to-all collective communication is a core primitive in distributed machine learning and high-performance computing. At the server scale, the communication demands of these workloads are increasingly outstripping the bandwidth and energy limits of electrical interconnects, driving a growing interest in photonic interconnects. However, leveraging these interconnects for all-to-all communication is nontrivial. The core challenge lies in jointly optimizing a sequence of topologies and flow schedules, reconfiguring only when the transmission savings from traversing shorter paths outweigh the reconfiguration cost. Yet the search space of this joint optimization is enormous. Existing work sidesteps this challenge by making unrealistic assumptions on reconfiguration costs so that it is never or always worthwhile to reconfigure. In this paper, we show that any candidate sequence of topologies and flow schedules can be expressed as a sum of adjacency matrices and their powers. This abstraction captures the entire solution space and yields a lower bound on all-to-all completion time. Building on this formulation, we identify a family of topology sequences with strong symmetry and high expansion that admits bandwidth-efficient schedules, which our algorithm constructs with low computational overhead. Together, these insights allow us to efficiently construct near-optimal solutions, effectively avoiding enumeration of the combinatorial design space. Evaluation shows that our approach reduces all-to-all completion time by up to 44% on average across a wide range of network parameters, message sizes and workload types.
Open MIND · 2026-02-11
preprintSenior authorAll-to-all collective communication is a core primitive in distributed machine learning and high-performance computing. At the server scale, the communication demands of these workloads are increasingly outstripping the bandwidth and energy limits of electrical interconnects, driving a growing interest in photonic interconnects. However, leveraging these interconnects for all-to-all communication is nontrivial. The core challenge lies in jointly optimizing a sequence of topologies and flow schedules, reconfiguring only when the transmission savings from traversing shorter paths outweigh the reconfiguration cost. Yet the search space of this joint optimization is enormous. Existing work sidesteps this challenge by making unrealistic assumptions on reconfiguration costs so that it is never or always worthwhile to reconfigure. In this paper, we show that any candidate sequence of topologies and flow schedules can be expressed as a sum of adjacency matrices and their powers. This abstraction captures the entire solution space and yields a lower bound on all-to-all completion time. Building on this formulation, we identify a family of topology sequences with strong symmetry and high expansion that admits bandwidth-efficient schedules, which our algorithm constructs with low computational overhead. Together, these insights allow us to efficiently construct near-optimal solutions, effectively avoiding enumeration of the combinatorial design space. Evaluation shows that our approach reduces all-to-all completion time by up to 44% on average across a wide range of network parameters, message sizes and workload types.
Leibniz-Zentrum für Informatik (Schloss Dagstuhl) · 2026-01-03
otherOpen accessCode repository for our NINeS 2026 paper titled "Passive Data-Plane Telemetry to Mitigate Long-Distance BGP Hijacks".
Global BGP Attacks that Evade Route Monitoring
Lecture notes in computer science · 2025-01-01 · 3 citations
book-chapterMaking Logic a First-Class Citizen in Generative ML for Networking
ArXiv.org · 2025-06-30
preprintOpen accessSenior authorGenerative ML models are increasingly popular in networking for tasks such as telemetry imputation, prediction, and synthetic trace generation. Despite their capabilities, they suffer from two shortcomings: \emph{(i)} their output is often visibly violating well-known networking rules, which undermines their trustworthiness; and \emph{(ii)} they are difficult to control, frequently requiring retraining even for minor changes. To address these limitations and unlock the benefits of generative models for networking, we propose a new paradigm for integrating explicit network knowledge, in the form of first-order logic rules, into ML models used for networking tasks. Rules capture well-known relationships among observed signals, e.g., that increased latency precedes packet loss. While the idea is conceptually straightforward, its realization is challenging: networking knowledge is rarely formalized into rules, and naively injecting rules into ML models often hampers their effectiveness. This paper introduces NetNomos, a multi-stage framework that \emph{(i)} learns rules directly from data (e.g., measurements); \emph{(ii)} filters them to select semantically meaningful ones; and \emph{(iii)} enforces them through collaborative generation between an ML model and a Satisfiability Modulo Theories (SMT) solver. %We evaluate NetNomos both component-wise and end-to-end across four diverse network datasets. We show that NetNomos learns diverse, meaningful rules from four real-world datasets and is 1.6--6.5$\times$ more scalable than DuoAI, a state-of-the-art (SOTA) rule-learning method. By enforcing these rules on a generic GPT-2 model, NetNomos achieves performance on par with or even surpassing specialized SOTA systems such as Zoom2Net and NetShare across three networking tasks: telemetry imputation, traffic forecasting, and synthetic data generation.
Just-in-Time Logic Enforcement
2025-11-17
articleOpen accessSenior authorWhile ML can greatly aid network management, it often makes glaring mistakes that contradict common sense or domain-specific constraints, undermining its trustworthiness and hindering adoption. To address this mismatch, this paper advocates for enforcing logic during ML inference (or Just-In-Time), rather than during training or post-inference in prior work. We find that this approach offers correctness guarantees without sacrificing statistical fidelity, thereby maximizing the benefits of both ML and formal reasoning.
Routing Attacks in Ethereum PoS: A Systematic Exploration
ArXiv.org · 2025-05-12
preprintOpen accessSenior authorWith the promise of greater decentralization and sustainability, Ethereum transitioned from a Proof-of-Work (PoW) to a Proof-of-Stake (PoS) consensus mechanism. The new consensus protocol introduces novel vulnerabilities that warrant further investigation. The goal of this paper is to investigate the security of Ethereum's PoS system from an Internet routing perspective. To this end, this paper makes two contributions: First, we devise a novel framework for inferring the distribution of validators on the Internet without disturbing the real network. Second, we introduce a class of network-level attacks on Ethereum's PoS system that jointly exploit Internet routing vulnerabilities with the protocol's reward and penalty mechanisms. We describe two representative attacks: StakeBleed, where the attacker triggers an inactivity leak, halting block finality and causing financial losses for all validators; and KnockBlock, where the attacker increases her expected MEV gains by preventing targeted blocks from being included in the chain. We find that both attacks are practical and effective. An attacker executing StakeBleed can inflict losses of almost 300 ETH in just 2 hours by hijacking as few as 30 IP prefixes. An attacker implementing KnockBlock could increase their MEV expected gains by 44.5% while hijacking a single prefix for less than 2 minutes. Our paper serves as a call to action for validators to reinforce their Internet routing infrastructure and for the Ethereum P2P protocol to implement stronger mechanisms to conceal validator locations.
Mitigating Inter-datacenter Incast with a Proxy
2025-11-17
articleOpen accessSenior authorMany-to-one communication (i.e., incast) is a long-standing challenge in networking with a wide range of proposed solutions. However, as incast-inducing applications today (e.g., storage, ML training) scale beyond a single datacenter, they introduce new challenges that current solutions do not handle. In particular, inter-datacenter links have orders of magnitude higher latency than intra-datacenter paths, lengthening the feedback loop that senders rely on to adjust their sending rates and drastically increasing incast completion times.
ArXiv.org · 2025-08-15
preprintOpen accessSenior authorSynthetic network data generators (SynNetGens) are increasingly used to share realistic traffic traces without exposing sensitive raw data. While substantial effort has gone into improving fidelity, privacy is either assumed to be a built-in property of synthesis or addressed through differential privacy at the packet or flow level. This paper uncovers a fundamental privacy vulnerability: SynNetGens preserve cross-flow behavioral correlations that expose source-level membership, allowing an attacker to determine whether traffic of specific user, or service was included in the training data. This leakage arises from a mismatch in abstraction: existing SynNetGens operate and are protected at the packet or flow level, while sensitive information is encoded in correlations across flows from the same source. To demonstrate that this vulnerability is exploitable in practice, we develop TraceBleed, the first source-level membership inference attack against black-box SynNetGens. Our evaluation spans five datasets and six SynNetGens, revealing that: (i) every generator leaks source-level information on at least some datasets; (ii) flow- or packet-level differential privacy fails to protect source privacy unless fidelity is degraded to unusable levels; and (iii) releasing 10X more synthetic data amplifies leakage by 130% on average. To support ongoing research in this area, we will maintain a public privacy-fidelity leaderboard so practitioners can choose generators that fit their needs and researchers can benchmark new designs faithfully.
Recent grants
IMR: MM-1C: Fine-grained Network Monitoring via Software Imputation
NSF · $600k · 2023–2027
Frequent coauthors
- 24 shared
George Kollias
National and Kapodistrian University of Athens
- 21 shared
Euripides G. Stephanou
Cyprus Institute
- 19 shared
Laurent Vanbever
ETH Zurich
- 10 shared
Christina Eftychi
Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases
- 9 shared
Maria Alexiou
- 9 shared
Aliki Iniotaki
Laiko General Hospital of Athens
- 9 shared
Marietta Armaka
Alexander Fleming Biomedical Sciences Research Center
- 9 shared
Niki Karagianni
Labs
Awards & honors
- N2Women Rising Stars in Networking and Communications (2021)
- IRTF/IETF Applied Networking Research Prize (2018)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Maria Apostolaki
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup