Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Tingyi "Leo" Liu

Tingyi "Leo" Liu

· ProfessorVerified

University of Massachusetts Amherst · Materials Science and Engineering

Active 2008–2026

h-index15
Citations1.1k
Papers4720 last 5y
Funding$1.1M
See your match with Tingyi "Leo" Liu — sign in to PhdFit.Sign in

About

Tingyi 'Leo' Liu is an Associate Professor in the Department of Mechanical and Industrial Engineering at UMass Amherst, affiliated with the Riccio College of Engineering. His research focuses on bio-inspired soft robotics, soft electronics, and medical devices, as well as advanced manufacturing, heterogeneous integration, and roll-to-roll (R2R) manufacturing. He also works on super-repellent surfaces, superomniphobic surfaces, superhydrophobic surfaces, and liquid-metal-based micro/nano-devices. Dr. Liu holds a PhD and MS in Mechanical Engineering from the University of California, Los Angeles (UCLA), and a Bachelor's degree in Electrical Engineering from Zhejiang University in China. His work involves interdisciplinary interface engineering, contributing to healthcare and biomedicine, advanced manufacturing, and supply chain management.

Research topics

  • Computer Science
  • Data Mining
  • Theoretical computer science
  • Artificial Intelligence
  • Algorithm
  • Programming language
  • Computer architecture
  • Distributed computing
  • Mathematics
  • Parallel computing
  • Operating system

Selected publications

  • Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance

    ArXiv.org · 2026-01-15

    articleOpen access

    This paper presents our experience to understand latency variance caused by kernel and hardware events, which are often invisible at the application level. For this purpose, we have built VarMRI, a tool chain to monitor and analyze those events in the long term. To mitigate the "big data" problem caused by long-term monitoring, VarMRI selectively records a subset of events following two principles: it only records events that are affecting the requests recorded by the application; it records coarse-grained information first and records additional information only when necessary. Furthermore, VarMRI introduces an analysis method that is efficient on large amount of data, robust on different data set and against missing data, and informative to the user. VarMRI has helped us to carry out a 3,000-hour study of six applications and benchmarks on CloudLab. It reveals a wide variety of events causing latency variance, including interrupt preemption, Java GC, pipeline stall, NUMA balancing etc.; simple optimization or tuning can reduce tail latencies by up to 31%. Furthermore, the impacts of some of these events vary significantly across different experiments, which confirms the necessity of long-term monitoring.

  • Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance

    arXiv (Cornell University) · 2026-01-15

    preprintOpen access

    This paper presents our experience to understand latency variance caused by kernel and hardware events, which are often invisible at the application level. For this purpose, we have built VarMRI, a tool chain to monitor and analyze those events in the long term. To mitigate the "big data" problem caused by long-term monitoring, VarMRI selectively records a subset of events following two principles: it only records events that are affecting the requests recorded by the application; it records coarse-grained information first and records additional information only when necessary. Furthermore, VarMRI introduces an analysis method that is efficient on large amount of data, robust on different data set and against missing data, and informative to the user. VarMRI has helped us to carry out a 3,000-hour study of six applications and benchmarks on CloudLab. It reveals a wide variety of events causing latency variance, including interrupt preemption, Java GC, pipeline stall, NUMA balancing etc.; simple optimization or tuning can reduce tail latencies by up to 31%. Furthermore, the impacts of some of these events vary significantly across different experiments, which confirms the necessity of long-term monitoring.

  • AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure

    ArXiv.org · 2025-02-22

    preprintOpen access

    We introduce AIBrix, a cloud-native, open-source framework designed to optimize and simplify large-scale LLM deployment in cloud environments. Unlike traditional cloud-native stacks, AIBrix follows a co-design philosophy, ensuring every layer of the infrastructure is purpose-built for seamless integration with inference engines like vLLM. AIBrix introduces several key innovations to reduce inference costs and enhance performance including high-density LoRA management for dynamic adapter scheduling, LLM-specific autoscalers, and prefix-aware, load-aware routing. To further improve efficiency, AIBrix incorporates a distributed KV cache, boosting token reuse across nodes, leading to a 50% increase in throughput and a 70% reduction in inference latency. AIBrix also supports unified AI runtime which streamlines model management while maintaining vendor-agnostic engine compatibility. For large-scale multi-node inference, AIBrix employs hybrid orchestration -- leveraging Kubernetes for coarse-grained scheduling and Ray for fine-grained execution -- to balance efficiency and flexibility. Additionally, an SLO-driven GPU optimizer dynamically adjusts resource allocations, optimizing heterogeneous serving to maximize cost efficiency while maintaining service guarantees. Finally, AIBrix enhances system reliability with AI accelerator diagnostic tools, enabling automated failure detection and mock-up testing to improve fault resilience. AIBrix is available at https://github.com/vllm-project/aibrix.

  • An Empirical Study of Microscaling Formats for Low-Precision LLM Training

    2025-05-04 · 3 citations

    article

    This paper presents a comprehensive evaluation of microscaling (MX) quantization in the pre-training of large language models (LLMs), investigating its potential to enhance the computation and memory efficiencies. We systematically examine the effects of key design parameters - including data types, rounding modes, scaling strategies, granularity, and organization - on numerical accuracy and training stability. Our extensive experimental study on Llama3 models reveals critical insights into the challenges of 4-bit training for LLMs and identifies optimal configurations with mixed precisions of 4-bit and 6-bit MX formats that significantly enhance training quality, bridging the gap with higher-precision formats. This research provides valuable guidance on the benefits and limitations of MX quantization, laying the groundwork for future innovations in low-precision LLM training.

  • Exploring Performance and Cost Optimization with ASIC-Based CXL Memory

    2024-04-18 · 31 citations

    articleOpen access

    As memory-intensive applications continue to drive the need for advanced architectural solutions, Compute Express Link (CXL) has risen as a promising interconnect technology that enables seamless high-speed, low-latency communication between host processors and various peripheral devices. In this study, we explore the application performance of ASIC CXL memory in various data-center scenarios. We then further explore multiple potential impacts (e.g., throughput, latency, and cost reduction) of employing CXL memory via carefully designed policies and strategies. Our empirical results show the high potential of CXL memory, reveal multiple intriguing observations of CXL memory and contribute to the wide adoption of CXL memory in real-world deployment environments. Based on our benchmarks, we also develop an Abstract Cost Model that can estimate the cost benefit from using CXL memory.

  • AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

    2024-10-26 · 3 citations

    preprintOpen accessSenior author

    In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a multitask model presents significant challenges due to the complexities of balancing sparsity allocation and accuracy performance across multiple tasks. To tackle these challenges, we propose AdapMTL, an adaptive pruning framework for MTL models. AdapMTL leverages multiple learnable soft thresholds independently assigned to the shared backbone and the task-specific heads to capture the nuances in different components' sensitivity to pruning. During training, it co-optimizes the soft thresholds and MTL model weights to automatically determine the suitable sparsity level at each component to achieve both high task accuracy and high overall sparsity. It further incorporates an adaptive weighting mechanism that dynamically adjusts the importance of task-specific losses based on each task's robustness to pruning. We demonstrate the effectiveness of AdapMTL through comprehensive experiments on popular multitask datasets, namely NYU-v2 and Tiny-Taskonomy, with different architectures, showcasing superior performance compared to state-of-the-art pruning methods.

  • Scaler: Efficient and Effective Cross Flow Analysis

    2024-10-18

    articleOpen accessSenior author

    Performance analysis is challenging as different components (e.g., different libraries, and applications) of a complex system can interact with each other. However, few existing tools focus on understanding such interactions. To bridge this gap, we propose a novel analysis method-"Cross Flow Analysis (XFA)"- that monitors the interactions/flows across these components. We also built the Scaler profiler that provides a holistic view of the time spent on each component (e.g., library or application) and every API inside each component. This paper proposes multiple new techniques, such as Universal Shadow Table, and Relation-Aware Data Folding. These techniques enable Scaler to achieve low runtime overhead, low memory overhead, and high profiling accuracy. Based on our extensive experimental results, Scaler detects multiple unknown performance issues inside widely-used applications, and therefore will be a useful complement to existing work.

  • Understanding and Alleviating Memory Consumption in RLHF for LLMs

    arXiv (Cornell University) · 2024-10-21

    preprintOpen accessSenior author

    Fine-tuning with Reinforcement Learning with Human Feedback (RLHF) is essential for aligning large language models (LLMs). However, RLHF often encounters significant memory challenges. This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption. Additionally, we introduce a simple yet effective approach that substantially reduces the memory required for RLHF fine-tuning.

  • Improving Resource and Energy Efficiency for Cloud 3D through Excessive Rendering Reduction

    2024-04-18 · 1 citations

    articleOpen access

    The rise of cloud gaming makes interactive 3D applications an emerging type of data center workload. However, the excessive rendering in current cloud 3D systems leads to large gaps between the cloud and client frame rates (FPS, frames per second), thus wasting resources and power. Although FPS regulation can remove excessive rendering, due to the highly-varying frame processing time and the use of rendering delays, existing cloud FPS regulation solutions have low FPS and slow motion-to-photon (MtP) latency, causing violations of Quality-of-Service (QoS) requirements.

  • Profile Dynamic Memory Allocation in Autonomous Driving Software

    2023-08-10 · 1 citations

    articleSenior author

    The software-defined vehicle has driven the autonomy and electrification of the automotive industry. A technical challenge for software designers is how to leverage existing software from AI research and autonomous driving (AD) development and make it useful, reliable, and efficient for customer requirements and functional safety standards. However, the software is critical in autonomous driving (AD) systems, where it should ensure reliability and real-time guarantee simultaneously. Further, the AD industry may re-utilize existing mature software implementation (e.g., C++ STL libraries) in order to accelerate development. However, the jeopardy of reliability and real-time guarantee caused by dynamic memory management inside remains a major concern for practitioners in the field. This paper presents a software tool (called MemTrace) to conveniently analyze the dynamic memory management behavior of AD software and provide important analytical results for software designers to make judgments on software quality, run-time efficiency, and safety with high confidence. MemTrace relies on interception and instrumentation for profiling the explicit allocation behavior of general software, as well as the implicit memory allocations of using C++ STL containers and smart pointers. The profiling data will be analyzed for the behavior that could jeopardize software safety, for example, memory leak and memory external fragmentation. Our experiment results show that MemTrace can effectively provide detailed periteration results for AD software modules, and identify potential memory-related hazards. Through the profiling of prototype AD software with MemTrace, we have gained 6 insightful observations, including the potential risks associated with prolonged usage, and suggestions for effective utilization of STL containers and smart pointers in AD software, which can assist AD software developers during the development process.

Recent grants

Frequent coauthors

  • Steven Tang

    The Ohio State University

    13 shared
  • Emery D. Berger

    12 shared
  • Sam Silvestro

    The University of Texas at San Antonio

    11 shared
  • Jianjun Chen

    10 shared
  • Mingcan Xiang

    University of Massachusetts Amherst

    10 shared
  • Bo Wu

    Yibin University

    10 shared
  • Yang Wang

    The Ohio State University

    9 shared
  • Hongyu Liu

    8 shared

Labs

  • Interdisciplinary Interface Engineering LaboratoryPI

Awards & honors

  • UMass Board of Trustees Awards Tenure and Promotion to Six C…
  • UMass Amherst ADVANCE Fellows
  • NIH Trailblazer Awards
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Tingyi "Leo" Liu

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup