Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Deming Chen

· ProfessorVerified

University of Illinois Urbana-Champaign · Statistics and Computer Science

Active 1982–2026

h-index46
Citations7.4k
Papers456159 last 5y
Funding$505k
See your match with Deming Chen — sign in to PhdFit.Sign in

About

Deming Chen is the Abel Bliss Professor in the Grainger College of Engineering at the University of Illinois Urbana-Champaign. His research interests include machine learning and AI, system-level design methodologies, hybrid cloud systems, reconfigurable and heterogeneous computing, and security and confidential computing. He has published more than 300 research papers, received numerous awards including 10 Best Paper Awards, 2 ACM/SIGDA TCFPGA Hall-of-Fame Paper Awards, and 5 Best Poster Awards. Dr. Chen has delivered over 170 invited talks, including more than 20 keynote and distinguished lectures, and his research has led to several open-source solutions adopted by industry, such as FCUDA, DNNBuilder, CSRNet, SkyNet, ScaleHLS, and Medusa. He has received prestigious recognitions including ACM Fellow, IEEE Fellow, ACM Distinguished Speaker, and has served as Editor-in-Chief of ACM Transactions on Reconfigurable Technology and Systems (TRETS), where he increased the impact factor significantly. Additionally, Dr. Chen serves as the Illinois Director of the IBM-Illinois Discovery Accelerator Institute and the Director of the AMD Center of Excellence. His work encompasses innovative approaches like SnapKV, Medusa, ISDC, PandoGen, NimBlock, and ScaleHLS+HIDA, which address challenges in machine learning, hardware design, and system optimization.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Biology
  • Genetics
  • Machine Learning
  • Data Mining
  • Computer engineering
  • Mathematics
  • Parallel computing
  • Embedded system
  • Computational biology
  • Computer architecture

Selected publications

  • Report for NSF Workshop on AI for Electronic Design Automation

    ArXiv.org · 2026-01-20

    articleOpen access1st authorCorresponding

    This report distills the discussions and recommendations from the NSF Workshop on AI for Electronic Design Automation (EDA), held on December 10, 2024 in Vancouver alongside NeurIPS 2024. Bringing together experts across machine learning and EDA, the workshop examined how AI-spanning large language models (LLMs), graph neural networks (GNNs), reinforcement learning (RL), neurosymbolic methods, etc.-can facilitate EDA and shorten design turnaround. The workshop includes four themes: (1) AI for physical synthesis and design for manufacturing (DFM), discussing challenges in physical manufacturing process and potential AI applications; (2) AI for high-level and logic-level synthesis (HLS/LLS), covering pragma insertion, program transformation, RTL code generation, etc.; (3) AI toolbox for optimization and design, discussing frontier AI developments that could potentially be applied to EDA tasks; and (4) AI for test and verification, including LLM-assisted verification tools, ML-augmented SAT solving, security/reliability challenges, etc. The report recommends NSF to foster AI/EDA collaboration, invest in foundational AI for EDA, develop robust data infrastructures, promote scalable compute infrastructure, and invest in workforce development to democratize hardware design and enable next-generation hardware systems. The workshop information can be found on the website https://ai4eda-workshop.github.io/.

  • CASCADE: Context-Aware Relaxation for Speculative Image Decoding

    ArXiv.org · 2026-05-08

    articleOpen accessSenior author

    Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we identify previously overlooked patterns in the target model's behavior that emerge naturally in tree-based speculative decoding. Specifically, we formalize two properties, semantic interchangeability and convergence, arising from the redundancies in the target model's hidden state representations. By capturing these redundancies across the depth and breadth of the predicted token tree, our method identifies principled opportunities for acceptance relaxation without requiring additional training. Additionally, we enhance standalone drafter performance by injecting the redundancy signals from the target model into drafter training with minimal modification. We evaluate our approach across multiple text-to-image models and drafter architectures. Results show that CASCADE achieves state-of-the-art speedups for drafter-based speculative decoding, with up to 3.6x acceleration, while maintaining image quality and text-prompt fidelity.

  • Report for NSF Workshop on AI for Electronic Design Automation

    Open MIND · 2026-01-20

    preprint1st authorCorresponding

    This report distills the discussions and recommendations from the NSF Workshop on AI for Electronic Design Automation (EDA), held on December 10, 2024 in Vancouver alongside NeurIPS 2024. Bringing together experts across machine learning and EDA, the workshop examined how AI-spanning large language models (LLMs), graph neural networks (GNNs), reinforcement learning (RL), neurosymbolic methods, etc.-can facilitate EDA and shorten design turnaround. The workshop includes four themes: (1) AI for physical synthesis and design for manufacturing (DFM), discussing challenges in physical manufacturing process and potential AI applications; (2) AI for high-level and logic-level synthesis (HLS/LLS), covering pragma insertion, program transformation, RTL code generation, etc.; (3) AI toolbox for optimization and design, discussing frontier AI developments that could potentially be applied to EDA tasks; and (4) AI for test and verification, including LLM-assisted verification tools, ML-augmented SAT solving, security/reliability challenges, etc. The report recommends NSF to foster AI/EDA collaboration, invest in foundational AI for EDA, develop robust data infrastructures, promote scalable compute infrastructure, and invest in workforce development to democratize hardware design and enable next-generation hardware systems. The workshop information can be found on the website https://ai4eda-workshop.github.io/.

  • ICLAD 2025: Agents, Benchmarks, and the Next Wave of LLM-Aided Design [IEEE News]

    IEEE Solid-State Circuits Magazine · 2026-01-01

    article
  • CASCADE: Context-Aware Relaxation for Speculative Image Decoding

    arXiv (Cornell University) · 2026-05-08

    preprintOpen accessSenior author

    Autoregressive generation is a powerful approach for high-fidelity image synthesis, but it remains computationally demanding and slow even on the most advanced accelerators. While speculative decoding has been explored to mitigate this bottleneck, existing approaches fail to achieve efficiency gains comparable to those observed in text generation. A key limitation is the target model's high uncertainty during image generation, which leads to high draft token rejection rates. In this work, we identify previously overlooked patterns in the target model's behavior that emerge naturally in tree-based speculative decoding. Specifically, we formalize two properties, semantic interchangeability and convergence, arising from the redundancies in the target model's hidden state representations. By capturing these redundancies across the depth and breadth of the predicted token tree, our method identifies principled opportunities for acceptance relaxation without requiring additional training. Additionally, we enhance standalone drafter performance by injecting the redundancy signals from the target model into drafter training with minimal modification. We evaluate our approach across multiple text-to-image models and drafter architectures. Results show that CASCADE achieves state-of-the-art speedups for drafter-based speculative decoding, with up to 3.6x acceleration, while maintaining image quality and text-prompt fidelity.

  • Blasting ore size detection based on efficient dehazing network and multi-dimensional feature fusion

    Scientific Reports · 2026-02-28

    articleOpen access

    Ore particle size distribution is an important metric for evaluating blasting outcomes and affects the energy consumption of ore crushing equipment. Faced with dense accumulation of ore, nonuniform size distributions, dust occlusion, and target loss due to motion, using computer vision methods, we propose a blasting ore size detection method based on efficient dehazing network and multi-dimensional feature fusion, which is an improvement to YOLOv8. Firstly, we constructs an efficient defogging backbone network that combines feature attention and composite scalable backbone so that the model can efficiently extract the features of ore images and enhance the robustness of the model to dust interference in the ore crushing process. Secondly, we introduces a new feature fusion network that combines the convolution model and the Vmamba sequence model as well as cross-layer fusion of multi-scale features so that the model can effectively adapt to the dramatic scale change of blasting ore, capture fine ore and large-size ore, avoid ore omission, and improve the accuracy of particle size statistics. Finally, the multi-dimensional feature fusion ability of Dynamic Head was introduced to optimize the target detection head, and the feature fusion was further optimized so that the feature tensor obtained from the ore image was adapted to the detection and positioning task of ore, and the discrimination ability of the model for ore was improved. Experiments were conducted on a manually labeled jaw fracture ore dataset. Compared to the YOLOv8n algorithm, the average precision ([Formula: see text]) for detecting eight size categories of ore increased by 7%. On datasets containing interference such as smoke, dust, and wet conditions, the mean average precision at the IoU threshold of 0.5 (mAP50) improved by 7.6%. For fine ores below D5 (72 mm), the detection precision ([Formula: see text]) increased by 18.8%, while the recall rate ([Formula: see text]) rose by 13.8%. On the total one-class dataset, the recall rate ([Formula: see text]) and mAP50 reached 84% and 88.1%, respectively.

  • Proof2Silicon: Prompt Repair for Verified Code and Hardware Generation via Reinforcement Learning

    ArXiv.org · 2025-09-07

    preprintOpen accessSenior author

    Large Language Models (LLMs) have demonstrated impressive capabilities in automated code generation but frequently produce code that fails formal verification, an essential requirement for hardware and safety-critical domains. To overcome this fundamental limitation, we previously proposed PREFACE, a model-agnostic framework based on reinforcement learning (RL) that iteratively repairs the prompts provided to frozen LLMs, systematically steering them toward generating formally verifiable Dafny code without costly fine-tuning. This work presents Proof2Silicon, a novel end-to-end synthesis framework that embeds the previously proposed PREFACE flow to enable the generation of correctness-by-construction hardware directly from natural language specifications. Proof2Silicon operates by: (1) leveraging PREFACE's verifier-driven RL agent to optimize prompt generation iteratively, ensuring Dafny code correctness; (2) automatically translating verified Dafny programs into synthesizable high-level C using Dafny's Python backend and PyLog; and (3) employing Vivado HLS to produce RTL implementations. Evaluated rigorously on a challenging 100-task benchmark, PREFACE's RL-guided prompt optimization consistently improved Dafny verification success rates across diverse LLMs by up to 21%. Crucially, Proof2Silicon achieved an end-to-end hardware synthesis success rate of up to 72%, generating RTL designs through Vivado HLS synthesis flows. These results demonstrate a robust, scalable, and automated pipeline for LLM-driven, formally verified hardware synthesis, bridging natural-language specification and silicon realization.

  • PREFACE - A Reinforcement Learning Framework for Code Verification via LLM Prompt Repair

    2025-06-27

    articleOpen accessSenior author
  • StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs

    2025-10-17 · 2 citations

    articleOpen accessSenior author

    Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve efficiency, existing approaches struggle with inter-kernel correlations, external memory access management, and buffer optimization. In this work, we propose StreamTensor, a compiler framework that automatically constructs and optimizes stream-based dataflow accelerators. StreamTensor introduces a novel iterative tensor type system to explicitly encode stream layouts, enabling seamless kernel fusion, buffer allocation, and memory optimization. By systematically exploring three hierarchical design spaces, including tensor tiling, kernel fusion, and resource allocation, StreamTensor balances computational intensity, memory efficiency, and data streaming to maximize performance. Based on FPGA evaluations on Large Language Models (LLM), StreamTensor achieves up to 0.76x and 0.64x lower latency compared to the state-of-the-art FPGA LLM accelerators and GPUs, and up to 1.99x higher energy efficiency compared to GPUs, making it a promising approach for scalable dataflow-based deep learning acceleration.

  • MLCD: Machine Learning-Based Code Version and Device Selection for Heterogeneous Systems

    IEEE Transactions on Computers · 2025-04-08

    articleSenior author

    Heterogeneous systems with hardware accelerators are increasingly common, and various optimized implementations/algorithms exist for computation kernels. However, no single best combination of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code version and device</i> (C&D) can outper-form others across all input cases, demanding a method to select the best C&D pair based on input. We present <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">m</u>achine <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">l</u>earning-based <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</u>ode version and <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">d</u>evice selection method, named <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">MLCD</i>, that uses input data characteristics to select the best C&D pair dynamically. We also apply active learning to reduce the number of samples needed to construct the model. Demonstrated on two different CPU-GPU systems, MLCD achieves near-optimal speed-up regardless of which systems tested. Concretely, reporting results from system one with mid-end hardwares, it achieves 99.9%, 95.6%, 99.9%, and 98.6% of the optimal acceleration attainable through the ideal choice of C&D pairs in General Matrix Multiply, PageRank, N-body Simulation, and K-Motif Counting, respectively. MLCD achieves a speed-up of 2.57×, 1.58×, 2.68×, and 1.09× compared to baselines without MLCD. Additionally, MLCD handles end-to-end applications, achieving up to 10% and 46% speed-up over GPU-only and CPU-only solutions with Graph Neural Networks. Furthermore, it achieves 7.28× average speed-up in execution latency over the state-of-the-art approach and determines suitable code versions for unseen input 10<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">8</sup> − 10<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">10</sup>× faster.

Recent grants

Frequent coauthors

  • Wen‐mei Hwu

    University of Illinois Urbana-Champaign

    59 shared
  • Yao Chen

    51 shared
  • Jinjun Xiong

    50 shared
  • Xiaofan Zhang

    Google (United States)

    44 shared
  • Cong Hao

    42 shared
  • Kyle Rupnow

    39 shared
  • Takumi Maruyama

    Fujitsu (Japan)

    36 shared
  • A. Kasukawa

    Furukawa Electric (Japan)

    36 shared

Education

  • Ph.D., Computer Science

    University of California Los Angeles

Awards & honors

  • 10 Best Paper Awards
  • 2 ACM/SIGDA TCFPGA Hall-of-Fame Paper Awards
  • 5 Best Poster Awards
  • ACM SIGDA Distinguished Service Award
  • First Place Winner Award for International Hardware/System D…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Deming Chen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup