Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Saman Amarasinghe

Saman Amarasinghe

· Associate Professor of Electrical Engineering and Computer ScienceVerified

Massachusetts Institute of Technology · Electrical Engineering and Computer Science

Active 1993–2026

h-index61
Citations18.7k
Papers29651 last 5y
Funding$5.9M1 active
See your match with Saman Amarasinghe — sign in to PhdFit.Sign in

About

Saman Amarasinghe is the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science at MIT. His research areas include Artificial Intelligence and Machine Learning, Computer Architecture, Programming Languages and Software Engineering, Security and Cryptography, and Systems and Networking. He is involved in developing systems that sense, process, and transmit energy and information, leveraging computational, theoretical, and experimental tools to create groundbreaking sensors, energy transducers, and physical substrates for computation. His work addresses shared challenges facing humanity through innovative system design and analysis.

Research topics

  • Computer Science
  • Programming language
  • Parallel computing
  • Algorithm
  • Mathematics
  • Theoretical computer science
  • Computational science
  • Computer architecture
  • Physics
  • Operating system
  • Pure mathematics

Selected publications

  • Backwards Data-Flow Analysis using Prophecy Variables in the BuildIt System

    arXiv (Cornell University) · 2026-01-06

    preprintOpen access

    Many program transformations and optimizations require information about the future behavior of the program. A standard way to obtain this information is to build an intermediate program representation, then use a backwards program analysis to propagate relevant information against the flow of control back to the transformation/optimization site. We instead propose to use prophecy variables, which predict information about the future execution of the program, to enable such transformations and optimizations. We implement prophecy variables in BuildIt, a lightweight domain specific language implementation system. BuildIt uses staged compilation to implement high performance domain specific languages embedded within a standard general purpose programming language (C++). The BuildIt first phase uses standard C++ program execution to generate optimized C, C++, and CUDA second phase code. This approach enables BuildIt to eliminate programming language implementation components such as parsers and intermediate representations, delivering a dramatic decrease in the engineering effort required to implement domain specific languages. The combination of prophecy variables and repeated forward program execution enables BuildIt to extend this approach to include transformations and optimizations that require information about the future execution of the program without backwards analyses and without the engineering overhead associated with implementing these analyses. We formalize the use of prophecy variables for this purpose, discuss the implementation of prophecy variables and repeated execution in BuildIt, and present experimental results for BuildIt computations that benefit from optimizations enabled by the information that prophecy variables provide.

  • SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization

    arXiv (Cornell University) · 2026-04-23

    preprintOpen access

    With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.

  • Backwards Data-Flow Analysis using Prophecy Variable in the BuildIt System

    arXiv (Cornell University) · 2026-01-06

    articleOpen access

    Many program transformations and optimizations require information about the future behavior of the program. A standard way to obtain this information is to build an intermediate program representation, then use a backwards program analysis to propagate relevant information against the flow of control back to the transformation/optimization site. We instead propose to use prophecy variables, which predict information about the future execution of the program, to enable such transformations and optimizations. We implement prophecy variables in BuildIt, a lightweight domain specific language implementation system. BuildIt uses staged compilation to implement high performance domain specific languages embedded within a standard general purpose programming language (C++). The BuildIt first phase uses standard C++ program execution to generate optimized C, C++, and CUDA second phase code. This approach enables BuildIt to eliminate programming language implementation components such as parsers and intermediate representations, delivering a dramatic decrease in the engineering effort required to implement domain specific languages. The combination of prophecy variables and repeated forward program execution enables BuildIt to extend this approach to include transformations and optimizations that require information about the future execution of the program without backwards analyses and without the engineering overhead associated with implementing these analyses. We formalize the use of prophecy variables for this purpose, discuss the implementation of prophecy variables and repeated execution in BuildIt, and present experimental results for BuildIt computations that benefit from optimizations enabled by the information that prophecy variables provide.

  • Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums

    2026-03-10

    articleOpen access

    Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are primarily designed for CPUs and rarely produce high-performance GPU code. Second, when computations involve both sparse and dense regions, these compilers often fail to optimize the dense portions effectively. In this paper, we propose a new approach for expressing sparse computations. We start from format-agnostic Einsums over sparse tensors and rewrite them into format-conscious indirect Einsums, which explicitly encode format information by mapping sparse data and metadata onto dense tensor operations through indirect indexing. To execute indirect Einsums, we introduce the Insum compiler, which generates efficient GPU code for these Einsums by lowering to the PyTorch compiler, extended to better support Tensor Core–enabled indirect Einsums. We also present two fixed-length sparse formats, GroupCOO and BlockGroupCOO, designed to fit naturally with indirect Einsums. Our approach achieves 1.14×–3.81× speedups across a range of sparse GPU applications while reducing lines of code by 202×–4491× compared to hand-written implementations. The source code for Insum is publicly available at https://github.com/nullplay/IndirectEinsum.

  • UniTe: A Universal Tensor Abstraction for Capturing Spatial Relationships

    ACM Transactions on Architecture and Code Optimization · 2026-01-10

    articleOpen accessSenior author

    Tensors are an integral part of numerous domains, and while significant effort has been put into the design of tensor data structures in isolation, little attention has been paid to the relationships that exist across tensors and how this affects their representation and use. In this article, we focus on spatial relationships across tensors in a program, where such tensors are defined relative to a common reference coordinate system. These relationships are complicated by the fact that the tensors may differ in their representations, such as having variations in their axes, spacings, origins, and overall shape. Due to the lack of existing abstractions and language support for these types of tensor semantics, users are currently forced to manually perform the bookkeeping necessary to account for these varying relationships and representations. Unfortunately, we cannot rely on a simple library to capture these relationships, as computations on these types of tensors often happen at the innermost levels of programs; we find that the overheads associated with an unoptimized implementation quickly accumulate, leading to performance up to nearly 65x slower than a reference C implementation on a series of image and video compression benchmarks. In this article, we introduce the novel UniTe abstraction, which captures spatial relationships across all such tensors in a program. We also introduce two domain-specific languages and optimizing compilers, CoLa for Python and SHiM for C/C++, built off of UniTe. Both CoLa and SHiM provide users with an intuitive set of tensor primitives based on spatial relationships, hiding the complexity that goes into maintaining the tensors and computing accesses across them. In addition, we discuss the optimizations necessary to remove the associated abstraction overhead and describe their implementations. On the benchmarks, we show that both CoLa and SHiM successfully remove the overheads, achieving performance parity with existing C implementations.

  • SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization

    arXiv (Cornell University) · 2026-04-23

    articleOpen access

    With network requirements diverging across emerging applications, latency-critical services demand minimal logic delay, while hyperscale training and collectives require sustained line-rate throughput for synchronized bulk transfers. This divergence creates an urgent need for custom network switches tailored to specialized protocols and application-specific traffic patterns. This paper presents SPAC (Switch and Protocol Adaptive Customization), a novel approach that automates the generation of FPGA-based network switches co-optimized for custom protocols and application-specific traffic patterns. SPAC introduces a unified workflow with a domain-specific language (DSL) for protocol-architecture co-design, a library of modular HLS-based adaptive switch components, and a trace-aware Design Space Exploration (DSE) engine. By providing a multi-fidelity simulation stack, SPAC enables rapid identification of Pareto-optimal designs prior to deployment. We demonstrate the efficacy of the domain-specific adaptation of SPAC across a spectrum of real-world scenarios, spanning from latency-sensitive sensor and HFT networks to hyperscale datacenter fabrics. Experimental results show that by tailoring the micro-architecture and protocol to the specific workload, SPAC-generated designs reduce LUT and BRAM usage by 55% and 53%, respectively. Compared to fixed-architecture counterparts, SPAC delivers latency reductions ranging from 7.8% to 38.4% across various tasks while maintaining adequate resource consumption and packet drop rate.

  • Next-Gen Hydroponics: Emerging Technologies and Economic Pathways Reshaping Farming Systems for Global Food Security

    Journal of Global Agriculture and Ecology · 2025-10-22

    articleOpen access1st authorCorresponding

    Background: The food security crisis exacerbated by climate change, population growth, and urbanization requires resilient and sustainable agriculture systems. Hydroponics, as a central technology in Controlled Environment Agriculture (CEA), has the potential to be disruptively different from legacy farming with efficient use of water, nutrients, and space. Hydroponic systems can increase the yield by 30-200% depending on the crop or system used, especially in leafy items like lettuce and spinach. Hydroponics use 90-95% less water, in comparison with soil based agriculture because of recirculating the nutrient solution and evaporation differential. Aims: This review of the scientific literature covers the development and technological advancement of hydroponic systems with respect to initially developed strategies like Nutrient Film Technique (NFT), Deep Water Culture (DWC), and Aeroponics, to futuristic use cases with AI, IoT, and automation. Methodology: This study synthesizes information from more than fifty peer-reviewed articles published from 2015 - 2025 focused on specifically, thematic innovations in system design, vertical farming, recycling of resources, and harvesting-encoding environmental control. Discussion: Comparative analyses of hydroponics show evidence of advantages with increased yield and resource use efficiency; however, limits remain in terms of scalability, energy needs, and dependence of know-how. The socioeconomic analysis shows promise in employment opportunities and urban food sovereignty, meanwhile suggests a need to dismantle the rural-urban divide and distributional fairness. Policy, and weak institutional backer remain major limits to implementation. Conclusion: Ultimately, this review shows that hydroponics, with the help of an inclusive policy approach, and interdisciplinary advancements in innovation can be used as a revolutionary strategy for climate-smart sustainable, food systems in local communities on a global scale.

  • The Continuous Tensor Abstraction: Where Indices Are Real

    Proceedings of the ACM on Programming Languages · 2025-10-09 · 1 citations

    articleOpen accessSenior author

    This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]). It also presents continuous tensor algebra expressions, such as C x , y = A x , y ∗ B x , y , where indices are defined over a continuous domain. This work expands the traditional tensor model to include continuous tensors. Our implementation supports piecewise-constant tensors, on which infinite domains can be processed in finite time. We also introduce a new tensor format for efficient storage and a code generation technique for automatic kernel generation. For the first time, our abstraction expresses domains like computational geometry and computer graphics in the language of tensor programming. Our approach demonstrates competitive or better performance to hand-optimized kernels in leading libraries across diverse applications. Compared to hand-implemented libraries on a CPU, our compiler-based implementation achieves an average speedup of 9.20× on 2D radius search with ∼60× fewer lines of code (LoC), 1.22× on genomic interval overlapping queries (with ∼18× LoC saving), and 1.69× on trilinear interpolation in Neural Radiance Field (with ∼6× LoC saving).

  • Finch: Sparse and Structured Tensor Programming with Control Flow

    Proceedings of the ACM on Programming Languages · 2025-04-09 · 4 citations

    articleOpen accessSenior author

    From FORTRAN to NumPy, tensors have revolutionized how we express computation. However, tensors in these, and almost all prominent systems, can only handle dense rectilinear integer grids. Real world tensors often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Support for structured data is fragmented and incomplete. Existing frameworks limit the tensor structures and program control flow they support to better simplify the problem. In this work, we propose a new programming language, Finch, which supports both flexible control flow and diverse data structures. Finch facilitates a programming model which resolves the challenges of computing over structured tensors by combining control flow and data structures into a common representation where they can be co-optimized. Finch automatically specializes control flow to data so that performance engineers can focus on experimenting with many algorithms. Finch supports a familiar programming language of loops, statements, ifs, breaks, etc., over a wide variety of tensor structures, such as sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch reliably utilizes the key properties of structure, such as structural zeros, repeated values, or clustered non-zeros. We show that this leads to dramatic speedups in operations such as SpMV and SpGEMM, image processing, and graph analytics.

  • Modular GPU Programming with Typed Perspectives

    ArXiv.org · 2025-11-14

    preprintOpen access

    To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of many threads acting in concert to perform collective operations like Tensor Core instructions. The tension between these two mindsets makes modular programming error prone. Functions that encapsulate collective operations, despite being called per-thread, must be executed cooperatively by groups of threads. In this work, we introduce Prism, a new GPU language that restores modularity while still giving programmers the low-level control over collective operations necessary for high performance. Our core idea is typed perspectives, which materialize, at the type level, the granularity at which the programmer is controlling the behavior of threads. We describe the design of Prism, implement a compiler for it, and lay its theoretical foundations in a core calculus called Bundl. We implement state-of-the-art GPU kernels in Prism and find that it offers programmers the safety guarantees needed to confidently write modular code without sacrificing performance.

Recent grants

Frequent coauthors

Education

  • Ph.D., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology

    1993
  • M.S., Electrical Engineering and Computer Science

    Massachusetts Institute of Technology

    1989
  • B.S., Electrical Engineering

    University of Moratuwa

    1986

Awards & honors

  • 2025-26 EECS Faculty Award Roundup
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Saman Amarasinghe

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup