Yudong Chen

· Associate ProfessorVerified

University of Wisconsin-Madison · Computer Sciences

Active 1997–2026

h-index47

Citations8.6k

Papers307127 last 5y

Funding$753k

Faculty page Lab page Website

See your match with Yudong Chen — sign in to PhdFit.Sign in

About

Yudong Chen is an Associate Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. His research interests include machine learning, reinforcement learning, optimization, and high-dimensional statistics. His recent work focuses on reinforcement learning theory, non-convex and nonsmooth learning problems, stochastic optimization, and approximation. His research has been recognized with awards such as the NSF CAREER Award, the Vilas Associates Award, and paper awards from ACM SIGMETRICS, INFORMS, and the Applied Probability Society. Prior to his current position, Yudong Chen was an associate professor with tenure at the School of Operations Research and Information Engineering at Cornell University. He also completed a postdoctoral fellowship in the EECS Department at the University of California, Berkeley. He holds a Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin and obtained his B.S. and M.S. degrees in Automation from Tsinghua University.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Computer Security
Operating system
Mathematics
Algorithm
Distributed computing
Telecommunications
Statistics
Computer network

Selected publications

Exogenous sorbitol-chelated calcium mitigates toxicity of cadmium in peanut seedlings through physiological, biochemical, and transcriptomic regulation
Frontiers in Plant Science · 2026-03-23
articleOpen access
Introduction: Soil cadmium (Cd) contamination is considered to be one of the adverse stresses to which plants are subject. Research has demonstrated that exogenous calcium plays a crucial role in plant stress resistance. Methods: L.) exposed to Cd stress and supplied with either inorganic calcium or sorbitol-chelated calcium (SCC) at an equivalent calcium (Ca) concentration. This investigation was undertaken through integrated physiological, biochemical and transcriptomic analyses. Results: In the context of Cd stress, a marked inhibition in the growth parameters, photosynthetic activity, and root architecture of peanut seedlings was observed. This inhibition resulted in a significant accumulation of reactive oxygen species (ROS) within the plants. The application of exogenous calcium has been demonstrated to effectively alleviate Cd toxicity, with SCC exhibiting particularly notable efficacy in this regard. In comparison with Cd treatment, SCC significantly improved plant growth parameters and photosynthetic efficiency. Furthermore, SCC significantly enhanced superoxide dismutase (SOD) activity in tissues while concomitantly reducing malondialdehyde (MDA) and ROS levels, thereby mitigating membrane lipid oxidation. Concurrently, the analysis revealed that the SCC samples exhibited an upregulation of key genes, including AUX/IAA, GH3, SAUR, and JAZ. These genes have been implicated in promoting root growth and activating defence-related hormone pathways. Structural equation modelling further indicated that chlorophyll fluorescence exerted a significant positive influence on biomass accumulation, while excessive reactive oxygen species and osmotic regulators served as major inhibitory factors. Discussion: Consequently, SCC effectively mitigates Cd toxicity by stabilising photosynthetic systems, enhancing antioxidant defences, and regulating hormonal signalling, thereby promoting recovery of peanut seedling growth. The present study offers novel insights and a scientific basis for the efficient utilisation of Ca-containing fertilisers and the mitigation of heavy metal pollution in agricultural fields.
Publisher DOI
RePaViT: Scalable Vision Transformer Acceleration via Structural Reparameterization on Feedforward Network Layers
ArXiv.org · 2025-05-28
preprintOpen access
We reveal that feedforward network (FFN) layers, rather than attention layers, are the primary contributors to Vision Transformer (ViT) inference latency, with their impact signifying as model size increases. This finding highlights a critical opportunity for optimizing the efficiency of large-scale ViTs by focusing on FFN layers. In this work, we propose a novel channel idle mechanism that facilitates post-training structural reparameterization for efficient FFN layers during testing. Specifically, a set of feature channels remains idle and bypasses the nonlinear activation function in each FFN layer, thereby forming a linear pathway that enables structural reparameterization during inference. This mechanism results in a family of ReParameterizable Vision Transformers (RePaViTs), which achieve remarkable latency reductions with acceptable sacrifices (sometimes gains) in accuracy across various ViTs. The benefits of our method scale consistently with model sizes, demonstrating greater speed improvements and progressively narrowing accuracy gaps or even higher accuracies on larger models. In particular, RePa-ViT-Large and RePa-ViT-Huge enjoy 66.8% and 68.7% speed-ups with +1.7% and +1.1% higher top-1 accuracies under the same training strategy, respectively. RePaViT is the first to employ structural reparameterization on FFN layers to expedite ViTs to our best knowledge, and we believe that it represents an auspicious direction for efficient ViTs. Source code is available at https://github.com/Ackesnal/RePaViT.
Publisher OA PDF DOI
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
arXiv (Cornell University) · 2025-06-26
preprintOpen accessSenior author
We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage, and has been relatively underexamined from a theoretical perspective. While previous work obtains performance guarantees under single-policy data coverage assumptions, such guarantees utilize additional complexity measures which are uniform over all policies, such as the uniform mixing time. We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL. We are also the first to handle general weakly communicating MDPs, contrasting restrictive structural assumptions made in prior work. To achieve this, we introduce an algorithm based on pessimistic discounted value iteration enhanced by a novel quantile clipping technique, which enables the use of a sharper empirical-span-based penalty function. Our algorithm also does not require any prior parameter knowledge for its implementation. Remarkably, we show via hard examples that learning under our conditions requires coverage assumptions beyond the stationary distribution of the target policy, distinguishing single-policy complexity measures from previously examined cases. We also develop lower bounds nearly matching our main result.
Publisher OA PDF DOI
LoRA-One: One-Step Full Gradient Could Suffice for Fine-Tuning Large Language Models, Provably and Efficiently
ArXiv.org · 2025-02-03
preprintOpen accessSenior author
This paper explores how theory can guide and enhance practical algorithms, using Low-Rank Adaptation (LoRA, Hu et al. 2022) in large language models as a case study. We rigorously prove that, under gradient descent, LoRA adapters align with specific singular subspaces of the one-step full fine-tuning gradient. This result suggests that, by properly initializing the adapters using the one-step full gradient, subspace alignment can be achieved immediately and applicable to both linear and nonlinear models. Building on our theory, we propose a theory-driven algorithm, LoRA-One, where the linear convergence (as well as generalization) is built and incorporating preconditioners theoretically helps mitigate the effects of ill-conditioning. Besides, our theory reveals connections between LoRA-One and other gradient-alignment-based methods, helping to clarify misconceptions in the design of such algorithms. LoRA-One achieves significant empirical improvements over LoRA and its variants across benchmarks in natural language understanding, mathematical reasoning, and code generation. Code is available at: https://github.com/YuanheZ/LoRA-One.
Publisher OA PDF DOI
Responses of Microbial Community to Heterogeneous Dissolved Organic Nitrogen Constituents in the Hyporheic Zones of Treated Sewage–Dominated Rivers
Microbial Ecology · 2025-07-02 · 2 citations
articleOpen access
The hyporheic zone (HZ) of treated sewage-dominated rivers serves as a critical biogeochemical hotspot for dissolved organic nitrogen (DON) transformation, yet the mechanisms linking DON chemodiversity to microbial community dynamics remain poorly resolved. This study integrated spectroscopic fingerprinting, machine learning, and partial least squares path modeling (PLS-PM) to unravel the interactions between redox-stratified DON fractions and microbial consortia in two effluent-impacted rivers (Xi'an, China). The results revealed that DOM spectral parameters associated with distinct DON characteristics posed distinct effects on microbial communities, with the communities in oxic zones largely impacted by autobiogenic, aromatic, and protein-like DON, while the communities in suboxic zones were more intensely impacted by the humification degree of DON. Microbial communities exhibited redox-dependent niche differentiation; i.e., keystone taxa in oxic zones (e.g., Gamma-Proteobacteria) drove nitrogen assimilation, while suboxic taxa (e.g., Verrucomicrobia) prioritized stress-resistant D-amino acid metabolism. PLS-PM demonstrated that biomarkers exerted stronger control on nitrogen cycling (|path coefficients|> 0.6, P < 0.05) than keystone taxa, with summer communities showing higher model fit. Treated sewage-derived DON fostered specialized consortia through biochemical trade-offs, i.e., methionine recycling in oxic zones versus peptidoglycan modification in suboxic zones, thus highlighting the critical role of HZ in mitigating nitrogen pollution. These findings advance predictive modeling of DON-microbe interactions in anthropogenically perturbed aquatic ecosystems.
Publisher OA PDF DOI
Resolving Recapture Dynamics of Rydberg Electrons via Laser-Driven Frustrated Tunneling Ionization
Physical Review Letters · 2025-03-26 · 5 citations
article
By employing two-color counterrotating circularly polarized laser fields, we investigate the dynamics of electron recapture into Rydberg states under strong, ultrashort laser pulses, probed via coherent extreme-ultraviolet free-induction decay (XFID). Our study reveals significant distinctions between XFID and above-threshold high-order harmonic generation in terms of their ellipticity dependence on the driving-laser waveforms, yield variations with the laser-intensity ratios, and sensitivity to the driving-laser ellipticity. All these differences arise from the fundamentally distinct electron trajectories underlying the two processes. More importantly, our findings provide compelling evidence that Rydberg-electron recapture predominantly occurs at the end of the driving laser field, offering the first direct experimental confirmation of this long-proposed mechanism.
Publisher DOI
Cost-effective dynamic sampling in high dimensional online monitoring with advantage actor-critic
International Journal of Production Research · 2025-01-21 · 2 citations
article
Publisher DOI
Antibacterial Naphthoquinones from a Nicotiana tabacum Derived Endophytic Fusarium solani
Chemistry of Natural Compounds · 2025-01-01 · 1 citations
article1st authorCorresponding
Publisher DOI
Effects of Hydrochar Incorporation on the Nitrogen Leaching Flux Pattern and Load in Rice Paddy Soil and Crop Production
Plants · 2025-02-04 · 1 citations
articleOpen accessCorresponding
Hydrochar (HC) incorporation affects soil nitrogen (N) transformation, which could further affect the N leaching loss. We conducted a soil lysimeter experiment to evaluate the responses in terms of N leaching and rice yield to HC applied at a low (0.5%) or high (1.5%) rate, while considering three N inputs, i.e., 240, 192, and 144 kg/ha (named N240, N192, and N144, respectively). The results showed that the rice grain yield was highest (124.3 g/pot) for N192, while being significantly reduced to the minimum yield achieved in the study (110.3 g/pot) for N144. Interestingly, for the N input 144 kg/ha, HC application increased the rice grain yield by 6.9–8.0%, which was equivalent to that of N240. NH4+-N leaching occurred mainly during the first 4 weeks of the rice season, and HC did not influence NH4+-N leaching for both the N inputs, 192 and 240 kg/ha. However, compared to N144, N144 + HC1.5% recorded a significantly higher NH4+-N leaching loss of 34.6%. This suggests that the application of a high amount of HC increases the NH4+-N leaching risk when the N input is low. HC application resulted in 10.2–45.3% more NO3−-N leaching loss when the three N inputs were applied, the effect of which was significant in regard to the applications involving a 20 and 40% N reduction, but this occurred only with the applied treatments involving 1.5% HC. Moreover, we found that organic N was the main form of leachate N (>80%). More specifically, N144 + HC recorded 7.8–8.3% lower organic N leaching than N192. Based on the effects of HC on the rice grain yield and N leaching, we recommend applications involving a 40% N reduction (N144) with a lower amount of HC (HC 0.5%) to ensure high crop production and to protect the water environment.
Publisher OA PDF DOI
A Piecewise Lyapunov Analysis of Sub-quadratic SGD: Applications to Robust and Quantile Regression
2025-06-04 · 2 citations
article
Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective function f that is locally strongly convex with a sub--quadratic tail. This setting covers many widely used online statistical methods. We introduce a novel piecewise Lyapunov function that enables us to handle functions f with only first-order differentiability, which includes a wide range of popular loss functions such as Huber loss. Leveraging our proposed Lyapunov function, we derive finite-time moment bounds under general diminishing stepsizes, as well as constant stepsizes. We further establish the weak convergence, central limit theorem and bias characterization under constant stepsize, providing the first geometrical convergence result for sub--quadratic SGD. Our results have wide applications, especially in online statistical methods. In particular, we discuss two applications of our results. 1) Online robust regression: We consider a corrupted linear model with sub--exponential covariates and heavy--tailed noise. Our analysis provides convergence rates comparable to those for corrupted models with Gaussian covariates and noise. 2) Online quantile regression: Importantly, our results relax the common assumption in prior work that the conditional density is continuous and provide a more fine-grained analysis for the moment bounds.
Publisher DOI

Recent grants

CIF: Medium: Collaborative Research: Nonconvex Optimization for High-Dimensional Signal Estimation: Theory and Fast Algorithms
NSF · $369k · 2017–2022
CAREER: Embracing Local Minima and Nonsmoothness in Nonconvex Statistical Estimation: From Structures to Algorithms
NSF · $209k · 2021–2022
CRII: CIF: Limits and Robustness of Nonconvex Low-Rank Estimation
NSF · $175k · 2017–2020

Frequent coauthors

Zhensheng Tao
28 shared
Jiaming Xu
26 shared
Zongyuan Fu
Fudan University
24 shared
Constantine Caramanis
21 shared
Bingbing Zhu
21 shared
Sainan Peng
Fudan University
20 shared
Xiaoshuai Hang
Ministry of Ecology and Environment
20 shared
Xiaodong Li
20 shared

Labs

Convexified Modularity Maximization for Community DetectionPI
Community detection in graphs using semidefinite programming relaxation and doubly weighted k-median clustering.

Education

Ph.D., Electrical and Computer Engineering
University of Texas at Austin
M.S., Automation
Tsinghua University
B.S., Automation
Tsinghua University

Awards & honors

NSF CAREER Award
Vilas Associates Award
INFORMS Paper Award from the Applied Probability Society
Best Student Paper Award from ACM SIGMETRICS 2023
INFORMS Applied Probability Society Best Student Paper Prize…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Yudong Chen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you