Ermin Wei

· Associate Professor of Electrical and Computer Engineering, Industrial Engineering and Management Sciences and (by courtesy) Computer ScienceVerified

Northwestern University · Chemical Engineering

Active 2005–2026

h-index16

Citations1.8k

Papers11557 last 5y

Funding$1.4M

Faculty page

See your match with Ermin Wei — sign in to PhdFit.Sign in

About

Ermin Wei is an Associate Professor of Electrical and Computer Engineering, Industrial Engineering and Management Sciences, and (by courtesy) Computer Science at Northwestern University. His research interests focus on the control and operation of networked systems, with particular emphasis on market analysis of smart grids and energy networks. His work also includes the development of large-scale distributed optimization algorithms and theory, with a focus on nonlinear convex optimization, network optimization, asynchronous algorithms, and their applications. Wei's contributions aim to advance the understanding and efficiency of complex networked systems, impacting areas such as energy management and distributed computing.

Research topics

Computer Science
Computer Security
Mathematical optimization
Economics
Artificial Intelligence
Data Mining
Mathematics
Computer network
Operations research
Database
Microeconomics
Algorithm

Selected publications

Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
ArXiv.org · 2026-04-21
articleOpen accessSenior author
Safe Reinforcement Learning from Human Feedback (Safe RLHF) has recently achieved empirical success in developing helpful and harmless large language models by decoupling human preferences regarding helpfulness and harmlessness. Existing approaches typically rely on fitting fixed horizon reward models from human feedback and have only been validated empirically. In this paper, we formulate safe RLHF as an infinite horizon discounted Con- strained Markov Decision Process (CMDP), since humans may interact with the model over a continuing sequence of interactions rather than within a single finite episode. We propose two Safe RLHF algorithms that do not require reward model fitting and, in contrast to prior work assuming fixed-length trajectories, support flexible trajectory lengths for training. Both algo- rithms are based on the primal-dual method and achieve global convergence guarantees with polynomial rates in terms of policy gradient iterations, trajectory sample lengths, and human preference queries. To the best of our knowledge, this is the first work to study infinite horizon discounted CMDP under human feedback and establish global, non-asymptotic convergence.
Publisher OA PDF
Distributed Edge Computing Task Allocation with Network Effects
Open MIND · 2026-02-13
preprintSenior author
Field-deployable edge computing nodes form a network and are used to complete scientific tasks for remote sensing and monitoring. The networked nodes collectively decide which scientific applications to run while they are constrained by various factors, such as differing hardware constraints from heterogeneous nodes and time-varying quality of service (QoS) requirements. We model the problem of task allocation as an optimization problem that maximizes the QoS, subject to the constraints. We solve the optimization problem using a dual-descent method, which can be easily implemented in a distributed way subject to the communication constraints of the network. Using a simulation that uses real-world data collected from Sage, a distributed sensor network, we analyze our policy's performance in dynamic situations where the required QoS and the nodes' capabilities change, and verify that it can adapt and return a feasible solution while accounting for those changes.
DOI
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
arXiv (Cornell University) · 2026-05-07
preprintOpen accessSenior author
Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimization -- and orthogonalization of the gradient momentum, exemplified by Muon and analyzed as steepest descent under the spectral norm. The two routes are typically developed in isolation. We make a structural observation about KL-Shampoo's Kronecker preconditioners: their eigenvalue spectra exhibit a \emph{spike-and-flat} shape -- a few dominant eigenvalues followed by an approximately uniform tail -- across layers and training stages, holding exactly under a rank-$ρ$ signal-plus-noise gradient model. We exploit this structure by restricting one of KL-Shampoo's Kronecker factors to a parametric family aligned with the spike-and-flat shape: full spectral structure on a tracked $r$-dimensional subspace, single shared eigenvalue across the remaining $n-r$ directions. On these directions, we apply orthogonalization. An identity shows that this orthogonalization recovers the algebraic form of full KL-Shampoo's preconditioner. On four pre-training scales (GPT-2 124M / 350M, LLaMA 134M / 450M), Pro-KLShampoo consistently outperforms KL-Shampoo at every subspace rank we test in validation loss, peak per-GPU memory, and wallclock time to reach each loss level.
Publisher DOI
Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
ArXiv.org · 2026-05-07
articleOpen accessSenior author
Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimization -- and orthogonalization of the gradient momentum, exemplified by Muon and analyzed as steepest descent under the spectral norm. The two routes are typically developed in isolation. We make a structural observation about KL-Shampoo's Kronecker preconditioners: their eigenvalue spectra exhibit a \emph{spike-and-flat} shape -- a few dominant eigenvalues followed by an approximately uniform tail -- across layers and training stages, holding exactly under a rank-$ρ$ signal-plus-noise gradient model. We exploit this structure by restricting one of KL-Shampoo's Kronecker factors to a parametric family aligned with the spike-and-flat shape: full spectral structure on a tracked $r$-dimensional subspace, single shared eigenvalue across the remaining $n-r$ directions. On these directions, we apply orthogonalization. An identity shows that this orthogonalization recovers the algebraic form of full KL-Shampoo's preconditioner. On four pre-training scales (GPT-2 124M / 350M, LLaMA 134M / 450M), Pro-KLShampoo consistently outperforms KL-Shampoo at every subspace rank we test in validation loss, peak per-GPU memory, and wallclock time to reach each loss level.
Publisher OA PDF
Distributed Edge Computing Task Allocation with Network Effects
arXiv (Cornell University) · 2026-02-13
articleOpen accessSenior author
Field-deployable edge computing nodes form a network and are used to complete scientific tasks for remote sensing and monitoring. The networked nodes collectively decide which scientific applications to run while they are constrained by various factors, such as differing hardware constraints from heterogeneous nodes and time-varying quality of service (QoS) requirements. We model the problem of task allocation as an optimization problem that maximizes the QoS, subject to the constraints. We solve the optimization problem using a dual-descent method, which can be easily implemented in a distributed way subject to the communication constraints of the network. Using a simulation that uses real-world data collected from Sage, a distributed sensor network, we analyze our policy's performance in dynamic situations where the required QoS and the nodes' capabilities change, and verify that it can adapt and return a feasible solution while accounting for those changes.
Publisher OA PDF
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
arXiv (Cornell University) · 2026-04-21
preprintOpen accessSenior author
Safe Reinforcement Learning from Human Feedback (Safe RLHF) has recently achieved empirical success in developing helpful and harmless large language models by decoupling human preferences regarding helpfulness and harmlessness. Existing approaches typically rely on fitting fixed horizon reward models from human feedback and have only been validated empirically. In this paper, we formulate safe RLHF as an infinite horizon discounted Con- strained Markov Decision Process (CMDP), since humans may interact with the model over a continuing sequence of interactions rather than within a single finite episode. We propose two Safe RLHF algorithms that do not require reward model fitting and, in contrast to prior work assuming fixed-length trajectories, support flexible trajectory lengths for training. Both algo- rithms are based on the primal-dual method and achieve global convergence guarantees with polynomial rates in terms of policy gradient iterations, trajectory sample lengths, and human preference queries. To the best of our knowledge, this is the first work to study infinite horizon discounted CMDP under human feedback and establish global, non-asymptotic convergence.
Publisher DOI
Incentivized Federated Learning and Unlearning
IEEE Transactions on Mobile Computing · 2025-04-04 · 3 citations
article
To protect users' <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">right to be forgotten</i> in federated learning, federated unlearning aims at eliminating the impact of leaving users' data on the global learned model. The current research in federated unlearning mainly concentrates on developing effective and efficient unlearning techniques. However, the issue of incentivizing valuable users to remain engaged and preventing their data from being unlearned is still under-explored, yet important to the unlearned model performance. This paper focuses on the incentive issue and develops an incentive mechanism for federated learning and unlearning. We first characterize the leaving users' impact on the global model accuracy and the required communication rounds for unlearning. Building on these results, we propose a four-stage game to capture the interaction and information updates during the learning and unlearning process. A key contribution is to summarize users' multi-dimensional private information into one-dimensional metrics to guide the incentive design. Interestingly, we prove that allowing federated unlearning can result in reduced payoffs for both the server and users, compared to a scenario without unlearning. Numerical results demonstrate the necessity of unlearning incentives for retaining valuable leaving users, and also show that our proposed mechanisms decrease the server's cost by up to 53.91% compared to state-of-the-art benchmarks.
Publisher DOI
Optimal Battery Placement in Power Grid
ArXiv.org · 2025-07-14
preprintOpen access
We study the optimal placement of an unlimited-capacity battery in power grids under a centralized market model, where the independent system operator (ISO) aims to minimize total generation costs through load shifting. The optimal battery placement is not well understood by the existing literature, especially regarding the influence of network topology on minimizing generation costs. Our work starts with decomposing the Mixed-Integer Linear Programming (MILP) problem into a series of Linear Programming (LP) formulations. For power grids with sufficiently large generation capacity or tree topologies, we derive analytical cost expressions demonstrating that, under reasonable assumptions, the weighted degree is the only topological factor for optimal battery placement. We also discuss the minor impact of higher-order topological conditions on tree-topology networks. To find the localized nature of a single battery's impact, we establish that the relative cost-saving benefit of a single battery decreases as the network scales. Furthermore, we design a low-complexity algorithm for weakly-cyclic networks. Numerical experiments show that our algorithm is not only approximately 100 times faster than commercial solvers but also maintains high accuracy even when some theoretical assumptions are relaxed.
Publisher OA PDF DOI
A Decomposition Framework for Nonlinear Nonconvex Two-Stage Optimization
arXiv (Cornell University) · 2025-01-20
preprintOpen accessSenior author
We propose a new decomposition framework for continuous nonlinear constrained two-stage optimization, where both first- and second-stage problems can be nonconvex. A smoothing technique based on an interior-point formulation renders the optimal solution of the second-stage problem differentiable with respect to the first-stage parameters. As a consequence, efficient off-the-shelf optimization packages can be utilized. We show that the solution of the nonconvex second-stage problem behaves locally like a differentiable function so that existing proofs can be applied to prove the convergence of the iterates to first-order optimal points for the first-stage. We also prove fast local convergence of the algorithm as the barrier parameter is driven to zero. Numerical experiments for large-scale instances demonstrate the computational advantages of the decomposition framework.
Publisher OA PDF DOI
Incentive Analysis for Agent Participation in Federated Learning
ArXiv.org · 2025-03-12
preprintOpen accessSenior author
Federated learning offers a decentralized approach to machine learning, where multiple agents collaboratively train a model while preserving data privacy. In this paper, we investigate the decision-making and equilibrium behavior in federated learning systems, where agents choose between participating in global training or conducting independent local training. The problem is first modeled as a stage game and then extended to a repeated game to analyze the long-term dynamics of agent participation. For the stage game, we characterize the participation patterns and identify Nash equilibrium, revealing how data heterogeneity influences the equilibrium behavior-specifically, agents with similar data qualities will participate in FL as a group. We also derive the optimal social welfare and show that it coincides with Nash equilibrium under mild assumptions. In the repeated game, we propose a privacy-preserving, computationally efficient myopic strategy. This strategy enables agents to make practical decisions under bounded rationality and converges to a neighborhood of Nash equilibrium of the stage game in finite time. By combining theoretical insights with practical strategy design, this work provides a realistic and effective framework for guiding and analyzing agent behaviors in federated learning systems.
Publisher OA PDF DOI

Recent grants

NRI: INT: Robotic Shepherding for Flow Control in Uncertain Dynamic Environments
NSF · $1.4M · 2020–2025

Frequent coauthors

Randall A. Berry
30 shared
Meng Zhang
12 shared
Haoran Yu
Beijing Institute of Technology
10 shared
Asuman Ozdaglar
10 shared
Charikleia Iakovidou
9 shared
Fatemeh Mansoori
Northwestern University
8 shared
Shenyinying Tu
Northwestern University
6 shared
Binnan Zhuang
6 shared

Labs

Communications and Networking LaboratoryPI

Education

MS, PhD, Electrical Engineering and Computer Science
Massachusetts Institute of Technology
2014
BS, Computer Engineering, Math, Finance
University of Maryland
2008

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Ermin Wei

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you