Saman Amarasinghe

· Associate Professor of Electrical Engineering and Computer ScienceVerified

Massachusetts Institute of Technology · Electrical Engineering and Computer Science

Active 1993–2026

h-index61

Citations18.7k

Papers29651 last 5y

Funding$5.9M1 active

Faculty page Lab page

See your match with Saman Amarasinghe — sign in to PhdFit.Sign in

About

Saman Amarasinghe is the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science at MIT. His research areas include Artificial Intelligence and Machine Learning, Computer Architecture, Programming Languages and Software Engineering, Security and Cryptography, and Systems and Networking. He is involved in developing systems that sense, process, and transmit energy and information, leveraging computational, theoretical, and experimental tools to create groundbreaking sensors, energy transducers, and physical substrates for computation. His work addresses shared challenges facing humanity through innovative system design and analysis.

Research topics

Computer Science
Programming language
Parallel computing
Algorithm
Mathematics
Theoretical computer science
Computational science
Computer architecture
Physics
Operating system
Pure mathematics

Selected publications

Backwards Data-Flow Analysis using Prophecy Variables in the BuildIt System
arXiv (Cornell University) · 2026-01-06
preprintOpen access
Many program transformations and optimizations require information about the future behavior of the program. A standard way to obtain this information is to build an intermediate program representation, then use a backwards program analysis to propagate relevant information against the flow of control back to the transformation/optimization site. We instead propose to use prophecy variables, which predict information about the future execution of the program, to enable such transformations and optimizations. We implement prophecy variables in BuildIt, a lightweight domain specific language implementation system. BuildIt uses staged compilation to implement high performance domain specific languages embedded within a standard general purpose programming language (C++). The BuildIt first phase uses standard C++ program execution to generate optimized C, C++, and CUDA second phase code. This approach enables BuildIt to eliminate programming language implementation components such as parsers and intermediate representations, delivering a dramatic decrease in the engineering effort required to implement domain specific languages. The combination of prophecy variables and repeated forward program execution enables BuildIt to extend this approach to include transformations and optimizations that require information about the future execution of the program without backwards analyses and without the engineering overhead associated with implementing these analyses. We formalize the use of prophecy variables for this purpose, discuss the implementation of prophecy variables and repeated execution in BuildIt, and present experimental results for BuildIt computations that benefit from optimizations enabled by the information that prophecy variables provide.
Publisher DOI
SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization
arXiv (Cornell University) · 2026-04-23
preprintOpen access
With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.
Publisher DOI
Backwards Data-Flow Analysis using Prophecy Variable in the BuildIt System
arXiv (Cornell University) · 2026-01-06
articleOpen access
Many program transformations and optimizations require information about the future behavior of the program. A standard way to obtain this information is to build an intermediate program representation, then use a backwards program analysis to propagate relevant information against the flow of control back to the transformation/optimization site. We instead propose to use prophecy variables, which predict information about the future execution of the program, to enable such transformations and optimizations. We implement prophecy variables in BuildIt, a lightweight domain specific language implementation system. BuildIt uses staged compilation to implement high performance domain specific languages embedded within a standard general purpose programming language (C++). The BuildIt first phase uses standard C++ program execution to generate optimized C, C++, and CUDA second phase code. This approach enables BuildIt to eliminate programming language implementation components such as parsers and intermediate representations, delivering a dramatic decrease in the engineering effort required to implement domain specific languages. The combination of prophecy variables and repeated forward program execution enables BuildIt to extend this approach to include transformations and optimizations that require information about the future execution of the program without backwards analyses and without the engineering overhead associated with implementing these analyses. We formalize the use of prophecy variables for this purpose, discuss the implementation of prophecy variables and repeated execution in BuildIt, and present experimental results for BuildIt computations that benefit from optimizations enabled by the information that prophecy variables provide.
Publisher OA PDF
Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
2026-03-10
articleOpen access
Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are primarily designed for CPUs and rarely produce high-performance GPU code. Second, when computations involve both sparse and dense regions, these compilers often fail to optimize the dense portions effectively. In this paper, we propose a new approach for expressing sparse computations. We start from format-agnostic Einsums over sparse tensors and rewrite them into format-conscious indirect Einsums, which explicitly encode format information by mapping sparse data and metadata onto dense tensor operations through indirect indexing. To execute indirect Einsums, we introduce the Insum compiler, which generates efficient GPU code for these Einsums by lowering to the PyTorch compiler, extended to better support Tensor Core–enabled indirect Einsums. We also present two fixed-length sparse formats, GroupCOO and BlockGroupCOO, designed to fit naturally with indirect Einsums. Our approach achieves 1.14×–3.81× speedups across a range of sparse GPU applications while reducing lines of code by 202×–4491× compared to hand-written implementations. The source code for Insum is publicly available at https://github.com/nullplay/IndirectEinsum.
Publisher DOI
UniTe: A Universal Tensor Abstraction for Capturing Spatial Relationships
ACM Transactions on Architecture and Code Optimization · 2026-01-10
articleOpen accessSenior author
Tensors are an integral part of numerous domains, and while significant effort has been put into the design of tensor data structures in isolation, little attention has been paid to the relationships that exist across tensors and how this affects their representation and use. In this article, we focus on spatial relationships across tensors in a program, where such tensors are defined relative to a common reference coordinate system. These relationships are complicated by the fact that the tensors may differ in their representations, such as having variations in their axes, spacings, origins, and overall shape. Due to the lack of existing abstractions and language support for these types of tensor semantics, users are currently forced to manually perform the bookkeeping necessary to account for these varying relationships and representations. Unfortunately, we cannot rely on a simple library to capture these relationships, as computations on these types of tensors often happen at the innermost levels of programs; we find that the overheads associated with an unoptimized implementation quickly accumulate, leading to performance up to nearly 65x slower than a reference C implementation on a series of image and video compression benchmarks. In this article, we introduce the novel UniTe abstraction, which captures spatial relationships across all such tensors in a program. We also introduce two domain-specific languages and optimizing compilers, CoLa for Python and SHiM for C/C++, built off of UniTe. Both CoLa and SHiM provide users with an intuitive set of tensor primitives based on spatial relationships, hiding the complexity that goes into maintaining the tensors and computing accesses across them. In addition, we discuss the optimizations necessary to remove the associated abstraction overhead and describe their implementations. On the benchmarks, we show that both CoLa and SHiM successfully remove the overheads, achieving performance parity with existing C implementations.
Publisher DOI
SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization
arXiv (Cornell University) · 2026-04-23
articleOpen access
With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.
Publisher OA PDF
Next-Gen Hydroponics: Emerging Technologies and Economic Pathways Reshaping Farming Systems for Global Food Security
Journal of Global Agriculture and Ecology · 2025-10-22
articleOpen access1st authorCorresponding
Background: The food security crisis exacerbated by climate change, population growth, and urbanization requires resilient and sustainable agriculture systems. Hydroponics, as a central technology in Controlled Environment Agriculture (CEA), has the potential to be disruptively different from legacy farming with efficient use of water, nutrients, and space. Hydroponic systems can increase the yield by 30-200% depending on the crop or system used, especially in leafy items like lettuce and spinach. Hydroponics use 90-95% less water, in comparison with soil based agriculture because of recirculating the nutrient solution and evaporation differential. Aims: This review of the scientific literature covers the development and technological advancement of hydroponic systems with respect to initially developed strategies like Nutrient Film Technique (NFT), Deep Water Culture (DWC), and Aeroponics, to futuristic use cases with AI, IoT, and automation. Methodology: This study synthesizes information from more than fifty peer-reviewed articles published from 2015 - 2025 focused on specifically, thematic innovations in system design, vertical farming, recycling of resources, and harvesting-encoding environmental control. Discussion: Comparative analyses of hydroponics show evidence of advantages with increased yield and resource use efficiency; however, limits remain in terms of scalability, energy needs, and dependence of know-how. The socioeconomic analysis shows promise in employment opportunities and urban food sovereignty, meanwhile suggests a need to dismantle the rural-urban divide and distributional fairness. Policy, and weak institutional backer remain major limits to implementation. Conclusion: Ultimately, this review shows that hydroponics, with the help of an inclusive policy approach, and interdisciplinary advancements in innovation can be used as a revolutionary strategy for climate-smart sustainable, food systems in local communities on a global scale.
Publisher DOI
The Continuous Tensor Abstraction: Where Indices Are Real
Proceedings of the ACM on Programming Languages · 2025-10-09 · 1 citations
articleOpen accessSenior author
This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]). It also presents continuous tensor algebra expressions, such as C x , y = A x , y ∗ B x , y , where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, on which infinite domains can be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance to hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20× on 2D radius search with ∼60× fewer lines of code (LoC), 1.22× on genomic interval overlapping queries (with ∼18× LoC saving), and 1.69× on trilinear interpolation in Neural Radiance Field (with ∼6× LoC saving).
Publisher DOI
Finch: Sparse and Structured Tensor Programming with Control Flow
Proceedings of the ACM on Programming Languages · 2025-04-09 · 4 citations
articleOpen accessSenior author
From FORTRAN to NumPy, tensors have revolutionized how we express computation. However, tensors in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world tensors often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the tensor structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured tensors by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of tensor structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, and graph analytics.
Publisher DOI
Modular GPU Programming with Typed Perspectives
ArXiv.org · 2025-11-14
preprintOpen access
To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of many threads acting in concert to perform collective operations like Tensor Core instructions. The tension between these two mindsets makes modular programming error prone. Functions that encapsulate collective operations, despite being called per-thread, must be executed cooperatively by groups of threads. In this work, we introduce Prism, a new GPU language that restores modularity while still giving programmers the low-level control over collective operations necessary for high performance. Our core idea is typed perspectives, which materialize, at the type level, the granularity at which the programmer is controlling the behavior of threads. We describe the design of Prism, implement a compiler for it, and lay its theoretical foundations in a core calculus called Bundl. We implement state-of-the-art GPU kernels in Prism and find that it offers programmers the safety guarantees needed to confidently write modular code without sacrificing performance.
Publisher OA PDF DOI

Recent grants

PPoSS: LARGE: Intel: Combining Learning and Formal Verification for Scalable Machine Programming (ScaMP)
NSF · $2.1M · 2022–2027
XPS: FULL: DSD: Scalable High Performance with Halide and Simit Domain Specific Languages
NSF · $845k · 2015–2020
NGS: StreamIt: A Language and a Compiler for Streaming Applications
NSF · $500k · 2004–2007
CISE Experimental Partnerships: MIT Raw Machine
NSF · $2.1M · 2000–2005
Collaborative Research: Programmable Microfluidics: A Universal Substrate for Biological Computing
NSF · $375k · 2006–2009

Frequent coauthors

William Thies
45 shared
Shoaib Kamil
Adobe Systems (United States)
33 shared
Anant Agarwal
The Ohio State University
26 shared
Fredrik Kjølstad
Stanford University
23 shared
Qin Zhao
Third Xiangya Hospital
23 shared
Monica S. Lam
21 shared
Rodric Rabbah
21 shared
Jason Ansel
Alpha Omega Alpha Medical Honor Society
20 shared

Education

Ph.D., Electrical Engineering and Computer Science
Massachusetts Institute of Technology
1993
M.S., Electrical Engineering and Computer Science
Massachusetts Institute of Technology
1989
B.S., Electrical Engineering
University of Moratuwa
1986

Awards & honors

2025-26 EECS Faculty Award Roundup

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Saman Amarasinghe

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you