
About
Renato Mancuso is an Associate Professor in the Department of Computer Science at Boston University (BU) and the director of the BU Cyber-Physical Systems Lab (CPSLab@BU). He is also affiliated with the BU Department of Electrical and Computer Engineering. He earned his Ph.D. from the University of Illinois at Urbana-Champaign (UIUC) in 2017. His research focuses on real-time and embedded systems, with particular interest in partially-reconfigurable platforms and OS-level multi-core resource management technologies designed for high-performance, safety-critical systems. Additionally, he explores applications and methodologies for designing, deploying, and analyzing Cyber-Physical Systems (CPS), as well as real-time cloud computing. His work also addresses security aspects for embedded systems and technologies related to unmanned aerial vehicles (UAVs). Mancuso actively seeks motivated master and Ph.D. students to collaborate with him in these research areas.
Research topics
- Computer Science
- Operating system
- Distributed computing
- Embedded system
- Computer architecture
- Computer Security
- Political Science
- Computer network
- Business
- Finance
Selected publications
Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality
arXiv (Cornell University) · 2026-04-14
articleOpen accessSenior authorThe shift to data-intensive processing from the cloud to the edge has introduced new challenges and expectations for the next generation of intelligent computing systems. As the memory wall continues to grow, modern systems can only meet these performance expectations by displaying data access patterns that exhibit ideal layouts in memory and ideal spatiotemporal locality in caches. However, only a few data-intensive applications are characterized by ideal locality. Instead, most applications exhibit either (i) poor locality when naively implemented and must undergo costly redesigns and tuning or (ii) inflated memory footprint to offer proper locality. To address the aforementioned challenges, we propose a hardware/software co-designed approach that can be implemented on commercially available SoC/FPGA platforms. Our approach seamlessly inserts in the CPUs' data path a Tensor Memory Engine that provides data with an ideal cache locality to running applications by (i) accessing the memory on behalf of the CPUs and (ii) composing a re-organized view of the memory layout. Unlike in- and near-memory computing approaches, it sets itself apart by clearly decoupling computing and memory accesses; computation is still performed on CPUs while the data re-organization is delegated to the Tensor Memory Engine.
Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality
arXiv (Cornell University) · 2026-04-14
preprintOpen accessSenior authorThe shift to data-intensive processing from the cloud to the edge has introduced new challenges and expectations for the next generation of intelligent computing systems. As the memory wall continues to grow, modern systems can only meet these performance expectations by displaying data access patterns that exhibit ideal layouts in memory and ideal spatiotemporal locality in caches. However, only a few data-intensive applications are characterized by ideal locality. Instead, most applications exhibit either (i) poor locality when naively implemented and must undergo costly redesigns and tuning or (ii) inflated memory footprint to offer proper locality. To address the aforementioned challenges, we propose a hardware/software co-designed approach that can be implemented on commercially available SoC/FPGA platforms. Our approach seamlessly inserts in the CPUs' data path a Tensor Memory Engine that provides data with an ideal cache locality to running applications by (i) accessing the memory on behalf of the CPUs and (ii) composing a re-organized view of the memory layout. Unlike in- and near-memory computing approaches, it sets itself apart by clearly decoupling computing and memory accesses; computation is still performed on CPUs while the data re-organization is delegated to the Tensor Memory Engine.
CAPA: A Framework for Contention-Aware and Progress-Aware Multi-Core Real-Time Systems
IEEE Transactions on Computers · 2026-01-01
articleSenior authorHeterogeneous Memory Benchmarking Toolkit
ArXiv.org · 2025-05-01
preprintOpen accessSenior authorThis paper presents an open-source kernel-level heterogeneous memory characterization framework (MemScope) for embedded systems that enables users to understand and precisely characterize the temporal behavior of all available memory modules under configurable contention stress scenarios. Since kernel-level provides a high degree of control over allocation, cache maintenance, $CPUs$, interrupts, and I/O device activity, seeking the most accurate way to benchmark heterogeneous memory subsystems, would be achieved by implementing it in the kernel. This gives us the privilege to directly map pieces of contiguous physical memory and instantiate allocators, allowing us to finely control cores to create and eliminate interference. Additionally, we can minimize noise and interruptions, guaranteeing more consistent and precise results compared to equivalent user-space solutions. Running our Framework on a Xilinx Zynq UltraScale+ ZCU102 CPU_FPGA platform, demonstrates its capability to precisely benchmark bandwidth and latency across various memory types, including PL-side DRAM and BRAM, in a multi-core system.
Closing the intent-to-behavior gap via Fulfillment Priority Logic
2025-10-19
articleSenior authorPractitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formulae representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of a non-linear utility scalarization design, intended explicitly for continuous control problems.
Burning Fetch Execution: A Framework for Zero-Trust Multi-party Confidential Computing
2025-10-21
book-chapterOpen accessSenior authorAbstract How can one tamper with data that does not exist? Motivated by this question, we present the Burning Fetch eXecution (BFX) paradigm. Data in-use is vulnerable, and the current focus on encrypting and/or isolating in-use data has fallen short. Frequently reported breaches of “secure” hardware and indispensable overhead with encryption schemes confirm that trust is the modern bottleneck. This work tackles the gap in existing safeguarding technology by avoiding byte-level decryption until it is immediately fetched by the processor, only to burn it right after. We perform on-the-fetch data decryption, immediately followed by burning , i.e., erasing right after processing cycles. Thus, BFX minimizes the existence of sensitive data in-use . BFX does not demand new processing hardware units nor requires restructuring application software. Three pillars set the BFX paradigm apart: (1) zero-trust multi-party confidentiality with (2) security rooted in transparency, and (3) high performance. By tackling the root of the issue, BFX enables a zero-trust multi-partied sharing without showing scenarios that were previously unthinkable. We showcase the impact of the BFX in a scenario with a highly privileged cloud insider attacker present. We exercise a sensitive mission whereby a third-party cloud processes fourth-party confidential real-time data streamed by a drone swarm. To further highlight the zero-trust nature of BFX , we assume the inference model (code) stream-processing on swarm data to be top-secret and owned by yet another party. The unknown threat, however, is the compromised processing system (cloud) where sensitive code and data are about to be deployed by all other parties-thanks to misplaced trust.
Surgical Software-less I/O Virtualization
2025-04-23
articleOpen accessSenior authorVirtualizing I/O devices presents unique challenges compared to other system resources. Traditional approaches rely on software-based abstraction layers, which can be particularly complex to develop and often fail to provide guests with efficient access to the underlying hardware. The most common solutions involve dedicating an I/O device exclusively to a single guest or modifying/patching the guest software to utilize hardware-level I/O multiplexing. This paper introduces I/O Softwareless Nano-Virtualization (IO-SNV), a novel approach that achieves efficient and transparent I/O virtualization with minimal overhead and no software modifications. IO-SNV operates entirely at the hardware level, leveraging programmable logic to dynamically virtualize I/O devices while maintaining high performance. We present the conceptual model, a proof-of-concept implementation, and an evaluation of its feasibility. Our promising results demonstrate that IO-SNV can provide seamless I/O virtualization while preserving device access efficiency, making a compelling alternative to existing software-centric solutions.
Light Virtualization: a proof-of-concept for hardware-based virtualization
ArXiv.org · 2025-02-06
preprintOpen accessSenior authorVirtualization has become widespread across all computing environments, from edge devices to cloud systems. Its main advantages are resource management through abstraction and improved isolation of platform resources and processes. However, there are still some important tradeoffs as it requires significant support from the existing hardware infrastructure and negatively impacts performance. Additionally, the current approaches to resource virtualization are inflexible, using a model that doesn't allow for dynamic adjustments during operation. This research introduces Light Virtualization (LightV), a new virtualization method for commercial platforms. LightV uses programmable hardware to direct cache coherence traffic, enabling precise and seamless control over which resources are virtualized. The paper explains the core principles of LightV, explores its capabilities, and shares initial findings from a basic proof-of-concept module tested on commercial hardware.
MEMSCOPE: Open-Source Kernel-Level Framework for Heterogeneous Memory Characterization
2025-12-02
articleSenior authorThis paper presents an open-source kernel-level heterogeneous memory characterization framework (MemScope) for embedded systems. MemScope enables precise characterization of the temporal behavior of available memory modules under configurable contention stress scenarios. MemScope leverages kernel-level control over physical memory allocation, cache maintenance, CPU state, interrupts, and I/O device activity to accurately benchmark heterogeneous memory subsystems. This gives us the privilege to directly map pieces of contiguous physical memory and instantiate allocators, allowing us to finely control cores to create and eliminate interference. Additionally, we can minimize noise and interruptions, guaranteeing more consistent and precise results compared to equivalent user-space solutions. Running our Framework on a Xilinx Zynq UltraScale+ ZCU102 CPU-FPGA platform demonstrates its capability to precisely benchmark bandwidth and latency across various memory types, including PL-side DRAM and BRAM, in a multi-core system.
2025-01-03 · 1 citations
articleRecent advancements in electric and hybrid-electric aircraft have sparked interest in Advanced Aerial Mobility (AAM), enabling innovative air transport solutions. Distributed Electric Propulsion (DEP), with its favorable aero-propulsive interactions, is a key focus for advanced aircraft designs, and thus an area of significant interest for academia and industry. Wayfarer Aircraft has developed and patented a novel new form of of blown wing lift augmentation called the Integrated High Lift Propulsor (IHLP), which uses DEP integrated with a Krueger slat or flap. This paper presents the instrumented, reconfigurable subscale IHLP distributed electric propulsion testbed that will be used to parameterize the performance and handling qualities (P&HQ) of IHLP. The subscale aircraft, which was developed from a 35%-scale Cessna 182 R/C scale model aircraft, was instrumented with a custom-integrated flight control and data acquisition system developed to enable this flight research - specifically a Holybro Pixhawk 6X autopilot running custom flight control software coupled with a custom version of the Al Volo FDAQ data acquisition system. The paper presents IHLP including its advantages and performance improvements in high-lift and cruise conditions compared to conventional distributed electric propulsion (DEP) systems. The detailed development of the IHLP subscale testbed are presented, including modifications, instrumentation design, and the integration of the data acquisition and flight control systems. Preliminary results from ground testing are presented to demonstrate the capabilities of the instrumented IHLP testbed.
Recent grants
Frequent coauthors
- 46 shared
Marco Caccamo
Technical University of Munich
- 22 shared
Rodolfo Pellizzoni
University of Waterloo
- 16 shared
Rohan Tabish
University of Illinois Urbana-Champaign
- 14 shared
Or D. Dantsker
Indiana University
- 13 shared
Shahin Roozkhosh
- 12 shared
Heechul Yun
- 11 shared
Denis Hoornaert
Technical University of Munich
- 10 shared
Kate Saenko
Labs
Education
Ph.D., Computer Science
University of Illinois at Urbana-Champaign
- 2012
M.S., Computer Engineering
University of Rome ‘Tor Vergata’
- 2009
B.S., Computer Engineering
University of Rome ‘Tor Vergata’
Awards & honors
- NSF CAREER Award (2023)
- Gerald and Deanne Gitner Family Award for Innovation in Teac…
- multiple Best (Student) Paper awards
- Outstanding Paper award
- Best Demo award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Renato Mancuso
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup