Deming Chen

· ProfessorVerified

University of Illinois Urbana-Champaign · Statistics and Computer Science

Active 1982–2026

h-index46

Citations7.4k

Papers456159 last 5y

Funding$505k

Faculty page

See your match with Deming Chen — sign in to PhdFit.Sign in

About

Deming Chen is the Abel Bliss Professor in the Grainger College of Engineering at the University of Illinois Urbana-Champaign. His research interests include machine learning and AI, system-level design methodologies, hybrid cloud systems, reconfigurable and heterogeneous computing, and security and confidential computing. He has published more than 300 research papers, received numerous awards including 10 Best Paper Awards, 2 ACM/SIGDA TCFPGA Hall-of-Fame Paper Awards, and 5 Best Poster Awards. Dr. Chen has delivered over 170 invited talks, including more than 20 keynote and distinguished lectures, and his research has led to several open-source solutions adopted by industry, such as FCUDA, DNNBuilder, CSRNet, SkyNet, ScaleHLS, and Medusa. He has received prestigious recognitions including ACM Fellow, IEEE Fellow, ACM Distinguished Speaker, and has served as Editor-in-Chief of ACM Transactions on Reconfigurable Technology and Systems (TRETS), where he increased the impact factor significantly. Additionally, Dr. Chen serves as the Illinois Director of the IBM-Illinois Discovery Accelerator Institute and the Director of the AMD Center of Excellence. His work encompasses innovative approaches like SnapKV, Medusa, ISDC, PandoGen, NimBlock, and ScaleHLS+HIDA, which address challenges in machine learning, hardware design, and system optimization.

Research topics

Computer Science
Artificial Intelligence
Biology
Genetics
Machine Learning
Data Mining
Computer engineering
Mathematics
Parallel computing
Embedded system
Computational biology
Computer architecture

Selected publications

Report for NSF Workshop on AI for Electronic Design Automation
ArXiv.org · 2026-01-20
articleOpen access1st authorCorresponding
This report distills the discussions and recommendations from the NSF Workshop on AI for Electronic Design Automation (EDA), held on December 10, 2024 in Vancouver alongside NeurIPS 2024. Bringing together experts across machine learning and EDA, the workshop examined how AI-spanning large language models (LLMs), graph neural networks (GNNs), reinforcement learning (RL), neurosymbolic methods, etc.-can facilitate EDA and shorten design turnaround. The workshop includes four themes: (1) AI for physical synthesis and design for manufacturing (DFM), discussing challenges in physical manufacturing process and potential AI applications; (2) AI for high-level and logic-level synthesis (HLS/LLS), covering pragma insertion, program transformation, RTL code generation, etc.; (3) AI toolbox for optimization and design, discussing frontier AI developments that could potentially be applied to EDA tasks; and (4) AI for test and verification, including LLM-assisted verification tools, ML-augmented SAT solving, security/reliability challenges, etc. The report recommends NSF to foster AI/EDA collaboration, invest in foundational AI for EDA, develop robust data infrastructures, promote scalable compute infrastructure, and invest in workforce development to democratize hardware design and enable next-generation hardware systems. The workshop information can be found on the website https://ai4eda-workshop.github.io/.
Publisher OA PDF
CASCADE: Context-Aware Relaxation for Speculative Image Decoding
ArXiv.org · 2026-05-08
articleOpen accessSenior author
Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we identify previously overlooked patterns in the target model's behavior that emerge naturally in tree-based speculative decoding. Specifically, we formalize two properties, semantic interchangeability and convergence, arising from the redundancies in the target model's hidden state representations. By capturing these redundancies across the depth and breadth of the predicted token tree, our method identifies principled opportunities for acceptance relaxation without requiring additional training. Additionally, we enhance standalone drafter performance by injecting the redundancy signals from the target model into drafter training with minimal modification. We evaluate our approach across multiple text-to-image models and drafter architectures. Results show that CASCADE achieves state-of-the-art speedups for drafter-based speculative decoding, with up to 3.6x acceleration, while maintaining image quality and text-prompt fidelity.
Publisher OA PDF
Report for NSF Workshop on AI for Electronic Design Automation
Open MIND · 2026-01-20
preprint1st authorCorresponding
This report distills the discussions and recommendations from the NSF Workshop on AI for Electronic Design Automation (EDA), held on December 10, 2024 in Vancouver alongside NeurIPS 2024. Bringing together experts across machine learning and EDA, the workshop examined how AI-spanning large language models (LLMs), graph neural networks (GNNs), reinforcement learning (RL), neurosymbolic methods, etc.-can facilitate EDA and shorten design turnaround. The workshop includes four themes: (1) AI for physical synthesis and design for manufacturing (DFM), discussing challenges in physical manufacturing process and potential AI applications; (2) AI for high-level and logic-level synthesis (HLS/LLS), covering pragma insertion, program transformation, RTL code generation, etc.; (3) AI toolbox for optimization and design, discussing frontier AI developments that could potentially be applied to EDA tasks; and (4) AI for test and verification, including LLM-assisted verification tools, ML-augmented SAT solving, security/reliability challenges, etc. The report recommends NSF to foster AI/EDA collaboration, invest in foundational AI for EDA, develop robust data infrastructures, promote scalable compute infrastructure, and invest in workforce development to democratize hardware design and enable next-generation hardware systems. The workshop information can be found on the website https://ai4eda-workshop.github.io/.
DOI
ICLAD 2025: Agents, Benchmarks, and the Next Wave of LLM-Aided Design [IEEE News]
IEEE Solid-State Circuits Magazine · 2026-01-01
article
Publisher DOI
CASCADE: Context-Aware Relaxation for Speculative Image Decoding
arXiv (Cornell University) · 2026-05-08
preprintOpen accessSenior author
Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we identify previously overlooked patterns in the target model's behavior that emerge naturally in tree-based speculative decoding. Specifically, we formalize two properties, semantic interchangeability and convergence, arising from the redundancies in the target model's hidden state representations. By capturing these redundancies across the depth and breadth of the predicted token tree, our method identifies principled opportunities for acceptance relaxation without requiring additional training. Additionally, we enhance standalone drafter performance by injecting the redundancy signals from the target model into drafter training with minimal modification. We evaluate our approach across multiple text-to-image models and drafter architectures. Results show that CASCADE achieves state-of-the-art speedups for drafter-based speculative decoding, with up to 3.6x acceleration, while maintaining image quality and text-prompt fidelity.
Publisher DOI
Blasting ore size detection based on efficient dehazing network and multi-dimensional feature fusion
Scientific Reports · 2026-02-28
articleOpen access
Ore particle size distribution is an important metric for evaluating blasting outcomes and affects the energy consumption of ore crushing equipment. Faced with dense accumulation of ore, nonuniform size distributions, dust occlusion, and target loss due to motion, using computer vision methods, we propose a blasting ore size detection method based on efficient dehazing network and multi-dimensional feature fusion, which is an improvement to YOLOv8. Firstly, we constructs an efficient defogging backbone network that combines feature attention and composite scalable backbone so that the model can efficiently extract the features of ore images and enhance the robustness of the model to dust interference in the ore crushing process. Secondly, we introduces a new feature fusion network that combines the convolution model and the Vmamba sequence model as well as cross-layer fusion of multi-scale features so that the model can effectively adapt to the dramatic scale change of blasting ore, capture fine ore and large-size ore, avoid ore omission, and improve the accuracy of particle size statistics. Finally, the multi-dimensional feature fusion ability of Dynamic Head was introduced to optimize the target detection head, and the feature fusion was further optimized so that the feature tensor obtained from the ore image was adapted to the detection and positioning task of ore, and the discrimination ability of the model for ore was improved. Experiments were conducted on a manually labeled jaw fracture ore dataset. Compared to the YOLOv8n algorithm, the average precision ([Formula: see text]) for detecting eight size categories of ore increased by 7%. On datasets containing interference such as smoke, dust, and wet conditions, the mean average precision at the IoU threshold of 0.5 (mAP50) improved by 7.6%. For fine ores below D5 (72 mm), the detection precision ([Formula: see text]) increased by 18.8%, while the recall rate ([Formula: see text]) rose by 13.8%. On the total one-class dataset, the recall rate ([Formula: see text]) and mAP50 reached 84% and 88.1%, respectively.
Publisher DOI
Proof2Silicon: Prompt Repair for Verified Code and Hardware Generation via Reinforcement Learning
ArXiv.org · 2025-09-07
preprintOpen accessSenior author
Large Language Models (LLMs) have demonstrated impressive capabilities in automated code generation but frequently produce code that fails formal verification, an essential requirement for hardware and safety-critical domains. To overcome this fundamental limitation, we previously proposed PREFACE, a model-agnostic framework based on reinforcement learning (RL) that iteratively repairs the prompts provided to frozen LLMs, systematically steering them toward generating formally verifiable Dafny code without costly fine-tuning. This work presents Proof2Silicon, a novel end-to-end synthesis framework that embeds the previously proposed PREFACE flow to enable the generation of correctness-by-construction hardware directly from natural language specifications. Proof2Silicon operates by: (1) leveraging PREFACE's verifier-driven RL agent to optimize prompt generation iteratively, ensuring Dafny code correctness; (2) automatically translating verified Dafny programs into synthesizable high-level C using Dafny's Python backend and PyLog; and (3) employing Vivado HLS to produce RTL implementations. Evaluated rigorously on a challenging 100-task benchmark, PREFACE's RL-guided prompt optimization consistently improved Dafny verification success rates across diverse LLMs by up to 21%. Crucially, Proof2Silicon achieved an end-to-end hardware synthesis success rate of up to 72%, generating RTL designs through Vivado HLS synthesis flows. These results demonstrate a robust, scalable, and automated pipeline for LLM-driven, formally verified hardware synthesis, bridging natural-language specification and silicon realization.
Publisher OA PDF DOI
PREFACE - A Reinforcement Learning Framework for Code Verification via LLM Prompt Repair
2025-06-27
articleOpen accessSenior author
Publisher DOI
StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
2025-10-17 · 2 citations
articleOpen accessSenior author
Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve efficiency, existing approaches struggle with inter-kernel correlations, external memory access management, and buffer optimization. In this work, we propose StreamTensor, a compiler framework that automatically constructs and optimizes stream-based dataflow accelerators. StreamTensor introduces a novel iterative tensor type system to explicitly encode stream layouts, enabling seamless kernel fusion, buffer allocation, and memory optimization. By systematically exploring three hierarchical design spaces, including tensor tiling, kernel fusion, and resource allocation, StreamTensor balances computational intensity, memory efficiency, and data streaming to maximize performance. Based on FPGA evaluations on Large Language Models (LLM), StreamTensor achieves up to 0.76x and 0.64x lower latency compared to the state-of-the-art FPGA LLM accelerators and GPUs, and up to 1.99x higher energy efficiency compared to GPUs, making it a promising approach for scalable dataflow-based deep learning acceleration.
Publisher OA PDF DOI
MLCD: Machine Learning-Based Code Version and Device Selection for Heterogeneous Systems
IEEE Transactions on Computers · 2025-04-08
articleSenior author
Heterogeneous systems with hardware accelerators are increasingly common, and various optimized implementations/algorithms exist for computation kernels. However, no single best combination of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code version and device (C&D) can outper-form others across all input cases, demanding a method to select the best C&D pair based on input. We present <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">machine <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">learning-based <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code version and <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">device selection method, named <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MLCD, that uses input data characteristics to select the best C&D pair dynamically. We also apply active learning to reduce the number of samples needed to construct the model. Demonstrated on two different CPU-GPU systems, MLCD achieves near-optimal speed-up regardless of which systems tested. Concretely, reporting results from system one with mid-end hardwares, it achieves 99.9%, 95.6%, 99.9%, and 98.6% of the optimal acceleration attainable through the ideal choice of C&D pairs in General Matrix Multiply, PageRank, N-body Simulation, and K-Motif Counting, respectively. MLCD achieves a speed-up of 2.57×, 1.58×, 2.68×, and 1.09× compared to baselines without MLCD. Additionally, MLCD handles end-to-end applications, achieving up to 10% and 46% speed-up over GPU-only and CPU-only solutions with Graph Neural Networks. Furthermore, it achieves 7.28× average speed-up in execution latency over the state-of-the-art approach and determines suitable code versions for unseen input 108 − 1010× faster.
Publisher DOI

Recent grants

CAREER: Nano-Centric Design Methodology for Nanoscale FPGAs
NSF · $400k · 2008–2014
Collaborative Research: From High-level Synthesis to Layout: a Cross-layer Methodology for Large-scale Reliable IC Design
NSF · $105k · 2013–2016

Frequent coauthors

Wen‐mei Hwu
University of Illinois Urbana-Champaign
59 shared
Yao Chen
51 shared
Jinjun Xiong
50 shared
Xiaofan Zhang
Google (United States)
44 shared
Cong Hao
42 shared
Kyle Rupnow
39 shared
Takumi Maruyama
Fujitsu (Japan)
36 shared
A. Kasukawa
Furukawa Electric (Japan)
36 shared

Education

Ph.D., Computer Science
University of California Los Angeles

Awards & honors

10 Best Paper Awards
2 ACM/SIGDA TCFPGA Hall-of-Fame Paper Awards
5 Best Poster Awards
ACM SIGDA Distinguished Service Award
First Place Winner Award for International Hardware/System D…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Deming Chen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you