
Ermin Wei
· Associate Professor of Electrical and Computer Engineering, Industrial Engineering and Management Sciences and (by courtesy) Computer ScienceVerifiedNorthwestern University · Chemical Engineering
Active 2005–2026
About
Ermin Wei is an Associate Professor of Electrical and Computer Engineering, Industrial Engineering and Management Sciences, and (by courtesy) Computer Science at Northwestern University. His research interests focus on the control and operation of networked systems, with particular emphasis on market analysis of smart grids and energy networks. His work also includes the development of large-scale distributed optimization algorithms and theory, with a focus on nonlinear convex optimization, network optimization, asynchronous algorithms, and their applications. Wei's contributions aim to advance the understanding and efficiency of complex networked systems, impacting areas such as energy management and distributed computing.
Research topics
- Computer Science
- Computer Security
- Mathematical optimization
- Economics
- Artificial Intelligence
- Data Mining
- Mathematics
- Computer network
- Operations research
- Database
- Microeconomics
- Algorithm
Selected publications
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
ArXiv.org · 2026-04-21
articleOpen accessSenior authorSafe Reinforcement Learning from Human Feedback (Safe RLHF) has recently achieved empirical success in developing helpful and harmless large language models by decoupling human preferences regarding helpfulness and harmlessness. Existing approaches typically rely on fitting fixed horizon reward models from human feedback and have only been validated empirically. In this paper, we formulate safe RLHF as an infinite horizon discounted Con- strained Markov Decision Process (CMDP), since humans may interact with the model over a continuing sequence of interactions rather than within a single finite episode. We propose two Safe RLHF algorithms that do not require reward model fitting and, in contrast to prior work assuming fixed-length trajectories, support flexible trajectory lengths for training. Both algo- rithms are based on the primal-dual method and achieve global convergence guarantees with polynomial rates in terms of policy gradient iterations, trajectory sample lengths, and human preference queries. To the best of our knowledge, this is the first work to study infinite horizon discounted CMDP under human feedback and establish global, non-asymptotic convergence.
Distributed Edge Computing Task Allocation with Network Effects
Open MIND · 2026-02-13
preprintSenior authorField-deployable edge computing nodes form a network and are used to complete scientific tasks for remote sensing and monitoring. The networked nodes collectively decide which scientific applications to run while they are constrained by various factors, such as differing hardware constraints from heterogeneous nodes and time-varying quality of service (QoS) requirements. We model the problem of task allocation as an optimization problem that maximizes the QoS, subject to the constraints. We solve the optimization problem using a dual-descent method, which can be easily implemented in a distributed way subject to the communication constraints of the network. Using a simulation that uses real-world data collected from Sage, a distributed sensor network, we analyze our policy's performance in dynamic situations where the required QoS and the nodes' capabilities change, and verify that it can adapt and return a feasible solution while accounting for those changes.
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
arXiv (Cornell University) · 2026-05-07
preprintOpen accessSenior authorOptimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimization -- and orthogonalization of the gradient momentum, exemplified by Muon and analyzed as steepest descent under the spectral norm. The two routes are typically developed in isolation. We make a structural observation about KL-Shampoo's Kronecker preconditioners: their eigenvalue spectra exhibit a \emph{spike-and-flat} shape -- a few dominant eigenvalues followed by an approximately uniform tail -- across layers and training stages, holding exactly under a rank-$ρ$ signal-plus-noise gradient model. We exploit this structure by restricting one of KL-Shampoo's Kronecker factors to a parametric family aligned with the spike-and-flat shape: full spectral structure on a tracked $r$-dimensional subspace, single shared eigenvalue across the remaining $n-r$ directions. On these directions, we apply orthogonalization. An identity shows that this orthogonalization recovers the algebraic form of full KL-Shampoo's preconditioner. On four pre-training scales (GPT-2 124M / 350M, LLaMA 134M / 450M), Pro-KLShampoo consistently outperforms KL-Shampoo at every subspace rank we test in validation loss, peak per-GPU memory, and wallclock time to reach each loss level.
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
ArXiv.org · 2026-05-07
articleOpen accessSenior authorOptimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimization -- and orthogonalization of the gradient momentum, exemplified by Muon and analyzed as steepest descent under the spectral norm. The two routes are typically developed in isolation. We make a structural observation about KL-Shampoo's Kronecker preconditioners: their eigenvalue spectra exhibit a \emph{spike-and-flat} shape -- a few dominant eigenvalues followed by an approximately uniform tail -- across layers and training stages, holding exactly under a rank-$ρ$ signal-plus-noise gradient model. We exploit this structure by restricting one of KL-Shampoo's Kronecker factors to a parametric family aligned with the spike-and-flat shape: full spectral structure on a tracked $r$-dimensional subspace, single shared eigenvalue across the remaining $n-r$ directions. On these directions, we apply orthogonalization. An identity shows that this orthogonalization recovers the algebraic form of full KL-Shampoo's preconditioner. On four pre-training scales (GPT-2 124M / 350M, LLaMA 134M / 450M), Pro-KLShampoo consistently outperforms KL-Shampoo at every subspace rank we test in validation loss, peak per-GPU memory, and wallclock time to reach each loss level.
Distributed Edge Computing Task Allocation with Network Effects
arXiv (Cornell University) · 2026-02-13
articleOpen accessSenior authorField-deployable edge computing nodes form a network and are used to complete scientific tasks for remote sensing and monitoring. The networked nodes collectively decide which scientific applications to run while they are constrained by various factors, such as differing hardware constraints from heterogeneous nodes and time-varying quality of service (QoS) requirements. We model the problem of task allocation as an optimization problem that maximizes the QoS, subject to the constraints. We solve the optimization problem using a dual-descent method, which can be easily implemented in a distributed way subject to the communication constraints of the network. Using a simulation that uses real-world data collected from Sage, a distributed sensor network, we analyze our policy's performance in dynamic situations where the required QoS and the nodes' capabilities change, and verify that it can adapt and return a feasible solution while accounting for those changes.
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
arXiv (Cornell University) · 2026-04-21
preprintOpen accessSenior authorSafe Reinforcement Learning from Human Feedback (Safe RLHF) has recently achieved empirical success in developing helpful and harmless large language models by decoupling human preferences regarding helpfulness and harmlessness. Existing approaches typically rely on fitting fixed horizon reward models from human feedback and have only been validated empirically. In this paper, we formulate safe RLHF as an infinite horizon discounted Con- strained Markov Decision Process (CMDP), since humans may interact with the model over a continuing sequence of interactions rather than within a single finite episode. We propose two Safe RLHF algorithms that do not require reward model fitting and, in contrast to prior work assuming fixed-length trajectories, support flexible trajectory lengths for training. Both algo- rithms are based on the primal-dual method and achieve global convergence guarantees with polynomial rates in terms of policy gradient iterations, trajectory sample lengths, and human preference queries. To the best of our knowledge, this is the first work to study infinite horizon discounted CMDP under human feedback and establish global, non-asymptotic convergence.
Incentivized Federated Learning and Unlearning
IEEE Transactions on Mobile Computing · 2025-04-04 · 3 citations
articleTo protect users' <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">right to be forgotten</i> in federated learning, federated unlearning aims at eliminating the impact of leaving users' data on the global learned model. The current research in federated unlearning mainly concentrates on developing effective and efficient unlearning techniques. However, the issue of incentivizing valuable users to remain engaged and preventing their data from being unlearned is still under-explored, yet important to the unlearned model performance. This paper focuses on the incentive issue and develops an incentive mechanism for federated learning and unlearning. We first characterize the leaving users' impact on the global model accuracy and the required communication rounds for unlearning. Building on these results, we propose a four-stage game to capture the interaction and information updates during the learning and unlearning process. A key contribution is to summarize users' multi-dimensional private information into one-dimensional metrics to guide the incentive design. Interestingly, we prove that allowing federated unlearning can result in reduced payoffs for both the server and users, compared to a scenario without unlearning. Numerical results demonstrate the necessity of unlearning incentives for retaining valuable leaving users, and also show that our proposed mechanisms decrease the server's cost by up to 53.91% compared to state-of-the-art benchmarks.
Optimal Battery Placement in Power Grid
ArXiv.org · 2025-07-14
preprintOpen accessWe study the optimal placement of an unlimited-capacity battery in power grids under a centralized market model, where the independent system operator (ISO) aims to minimize total generation costs through load shifting. The optimal battery placement is not well understood by the existing literature, especially regarding the influence of network topology on minimizing generation costs. Our work starts with decomposing the Mixed-Integer Linear Programming (MILP) problem into a series of Linear Programming (LP) formulations. For power grids with sufficiently large generation capacity or tree topologies, we derive analytical cost expressions demonstrating that, under reasonable assumptions, the weighted degree is the only topological factor for optimal battery placement. We also discuss the minor impact of higher-order topological conditions on tree-topology networks. To find the localized nature of a single battery's impact, we establish that the relative cost-saving benefit of a single battery decreases as the network scales. Furthermore, we design a low-complexity algorithm for weakly-cyclic networks. Numerical experiments show that our algorithm is not only approximately 100 times faster than commercial solvers but also maintains high accuracy even when some theoretical assumptions are relaxed.
A Decomposition Framework for Nonlinear Nonconvex Two-Stage Optimization
arXiv (Cornell University) · 2025-01-20
preprintOpen accessSenior authorWe propose a new decomposition framework for continuous nonlinear constrained two-stage optimization, where both first- and second-stage problems can be nonconvex. A smoothing technique based on an interior-point formulation renders the optimal solution of the second-stage problem differentiable with respect to the first-stage parameters. As a consequence, efficient off-the-shelf optimization packages can be utilized. We show that the solution of the nonconvex second-stage problem behaves locally like a differentiable function so that existing proofs can be applied to prove the convergence of the iterates to first-order optimal points for the first-stage. We also prove fast local convergence of the algorithm as the barrier parameter is driven to zero. Numerical experiments for large-scale instances demonstrate the computational advantages of the decomposition framework.
Incentive Analysis for Agent Participation in Federated Learning
ArXiv.org · 2025-03-12
preprintOpen accessSenior authorFederated learning offers a decentralized approach to machine learning, where multiple agents collaboratively train a model while preserving data privacy. In this paper, we investigate the decision-making and equilibrium behavior in federated learning systems, where agents choose between participating in global training or conducting independent local training. The problem is first modeled as a stage game and then extended to a repeated game to analyze the long-term dynamics of agent participation. For the stage game, we characterize the participation patterns and identify Nash equilibrium, revealing how data heterogeneity influences the equilibrium behavior-specifically, agents with similar data qualities will participate in FL as a group. We also derive the optimal social welfare and show that it coincides with Nash equilibrium under mild assumptions. In the repeated game, we propose a privacy-preserving, computationally efficient myopic strategy. This strategy enables agents to make practical decisions under bounded rationality and converges to a neighborhood of Nash equilibrium of the stage game in finite time. By combining theoretical insights with practical strategy design, this work provides a realistic and effective framework for guiding and analyzing agent behaviors in federated learning systems.
Recent grants
NRI: INT: Robotic Shepherding for Flow Control in Uncertain Dynamic Environments
NSF · $1.4M · 2020–2025
Frequent coauthors
- 30 shared
Randall A. Berry
- 12 shared
Meng Zhang
- 10 shared
Haoran Yu
Beijing Institute of Technology
- 10 shared
Asuman Ozdaglar
- 9 shared
Charikleia Iakovidou
- 8 shared
Fatemeh Mansoori
Northwestern University
- 6 shared
Shenyinying Tu
Northwestern University
- 6 shared
Binnan Zhuang
Labs
Communications and Networking LaboratoryPI
Education
- 2014
MS, PhD, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
- 2008
BS, Computer Engineering, Math, Finance
University of Maryland
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ermin Wei
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup