
Ceyhun Eksin
· Associate Professor, Industrial & Systems Engineering, Corrie and Jim Furber '64 Faculty Fellow, Affiliated Faculty, Electrical & Computer EngineeringVerifiedTexas A&M University · Industrial & Systems Engineering
Active 2008–2025
About
Ceyhun Eksin is an Associate Professor in the Department of Industrial & Systems Engineering at Texas A&M University, where he also holds the Corrie and Jim Furber '64 Faculty Fellowship. His educational background includes a Ph.D. in Electrical Systems & Engineering from the University of Pennsylvania, obtained in 2015. His research focuses on the analysis and design of networked multi-agent systems, with particular interest in distributed optimization, game theory, evolutionary game theory, networks, autonomous systems, energy systems, and epidemics. Eksin's work involves understanding complex interactions within these systems to improve their efficiency, robustness, and behavior, especially in contexts such as energy markets and disease dynamics.
Research topics
- Computer Science
- Physics
- Medicine
- Economics
- Psychology
- Econometrics
- Biology
Selected publications
A Lagrangian Framework for Safe Cooperative Reinforcement Learning
2025-12-09
articleSenior authorWe consider the problem of safe cooperative multiagent reinforcement learning (MARL) within the framework of a constrained multiagent Markov decision process (MDP). Agents share a common value function and learn to coordinate their actions to maximize a joint objective while adhering to system-level constraints. These constraints can enforce safety, reliability, or additional regulatory requirements governing the evolution of the multiagent system. We propose a Lagrangian-based approach, where agents iteratively solve a relaxed Lagrangian MDP using a joint learning mechanism. During execution, agents independently follow their policies, accumulating constraint violations over an epoch, which are then used to update the Lagrange multipliers. We show that continuous execution of this primal-dual algorithm produces episodes which are feasible almost surely. Further, we prove that the sequence of policies generated by the algorithm yields a nonstationary approximately optimal solution for the safe cooperative MARL problem.
The Lagrangian Method for Solving Constrained Markov Games
ArXiv.org · 2025-03-13
preprintOpen accessSenior authorWe propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for dynamic multiagent interactions in a multitude of settings, such as autonomous teams operating under local energy and time constraints, for example. We develop a primal-dual approach in which agents solve a Lagrangian game associated with the current Lagrange multiplier, simulate cost and reward trajectories over a fixed horizon, and update the multiplier using accrued experience. This update rule generates a new Lagrangian game, initiating the next iteration. Our key result consists in showing that the sequence of solutions to these Lagrangian games yields a nonstationary Nash solution for the original constrained Markov game.
Fictitious Play in Product Markov Games with Kullback-Leibler Control Cost
2025-10-26
articleWe present and analyze fictitious play for a new class of product Markov games with a Kullback-Leibler (KL) control cost. In a product Markov game, state transitions are the product of n Markov transition functions, where each agent controls its own local state transition dynamics given the common state and incurs a KL control cost for their efforts. Fictitious play entails each agent best-responding to minimize its discounted sum of instantaneous costs, that depend on KL control cost and a state cost, given local beliefs about other agents’ policies. Agents update their beliefs about other agents’ policies upon observation of the realized states. We show that the fictitious play converges asymptotically to a Nash equilibrium of a product Markov potential game. Simulation results on a multi-agent cloud radio access network confirm the convergence result for the game with non-identical payoffs and demonstrate the speed of convergence.
Learnings graph-Fourier spectra of textured surface images for defect localization
Manufacturing Letters · 2024-10-01 · 2 citations
articleOpen accessIn the realm of industrial manufacturing, product inspection remains a significant bottleneck, with only a small fraction of manufactured items undergoing inspection for surface defects. Advances in imaging systems and AI can allow automated full inspection of manufactured surfaces. However, even the most contemporary imaging and machine learning methods perform poorly for detecting defects in images with highly textured backgrounds, that stem from diverse manufacturing processes. This paper introduces an approach based on graph Fourier analysis to automatically identify defective images, as well as crucial graph Fourier coefficients that inform the defects in the images. The approach thereby facilitates precise localization and characterization of defects, amidst highly textured backgrounds. A convolutional neural network model (1D-CNN) was trained with the coefficients of the graph Fourier transform of the images as the input to identify, with classification accuracy of 99.4%, if the image contains a defect. An explainable AI method using SHAP (SHapley Additive exPlanations) was used to further analyze the trained 1D-CNN model to discern important spectral coefficients for each image. This approach sheds light on the crucial contribution of low-frequency graph eigen waveforms to precisely localize surface defects in images, thereby advancing the realization of zero-defect manufacturing.
SSRN Electronic Journal · 2024-01-01
preprintOpen accessSimulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost
arXiv (Cornell University) · 2024-10-19
preprintOpen accessThis paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their control efforts and an additional cost for the joint state. The proposed scheme consists of a greedy policy improvement step followed by an m-step temporal difference (TD) policy evaluation step. We use the separable structure of the instantaneous cost to show that the policy improvement step follows a Boltzmann distribution that depends on the current value function estimate and the uncontrolled transition probabilities. This allows agents to compute the improved joint policy independently. We show that both the synchronous (entire state space evaluation) and asynchronous (a uniformly sampled set of substates) versions of the OPI scheme with finite policy evaluation rollout converge to the optimal value function and an optimal joint policy asymptotically. Simulation results on a multi-agent MDP with KL control cost variant of the Stag-Hare game validates our scheme's performance in terms of minimizing the cost return.
Learning Nash in Constrained Markov Games With an α -Potential
IEEE Control Systems Letters · 2024-01-01 · 1 citations
articleSenior authorWe develop a best-response algorithm for solving constrained Markov games assuming limited violations for the potential game property. The limited violations of the potential game property mean that changes in value function due to unilateral policy alterations can be measured by the potential function up to an error <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\alpha $ </tex-math></inline-formula>. We show the existence of stationary <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-approximate constrained Nash policy whenever the set of feasible stationary policies is non-empty. Our setting has agents accessing an efficient probably approximately correct solver for a constrained Markov decision process which they use for generating best-response policies against the other agents’ former policies. For an accuracy threshold <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon \gt 4\alpha $ </tex-math></inline-formula>, the best-response dynamics generate provable convergence to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-Nash policy in finite time with probability at least <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1-\delta $ </tex-math></inline-formula> at the expense of polynomial bounds on sample complexity that scales with the reciprocal of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\delta $ </tex-math></inline-formula>.
arXiv (Cornell University) · 2024-10-26
preprintOpen accessSenior authorWe propose networked policy gradient play for solving Markov potential games with continuous and/or discrete state-action pairs. During the game, agents use parametrized and differentiable policies that depend on the current state and the policy parameters of other agents. During training, agents update their policy parameters following stochastic gradients. The gradient estimation involves two consecutive episodes, generating unbiased estimators of reward and policy score functions. In addition, it involves keeping estimates of others' parameters using consensus steps given local estimates received through a time-varying communication network. In Markov potential games, there exists a potential value function among agents with gradients corresponding to the gradients of local value functions. Using this structure, we prove almost sure convergence to a stationary point of the potential value function with rate $O(1/ε^2)$. Compared to previous works, our results do not require bounded policy gradients or initial agreement on the values of individual policy parameters. Numerical experiments on a dynamic multi-agent newsvendor problem verify the convergence of local beliefs and gradients. It further shows that networked policy gradient play converges as fast as independent policy gradient updates, while collecting higher rewards.
Mathematical Biosciences · 2024-07-04
articleAverage Submodularity of Maximizing Anticoordination in Network Games
SIAM Journal on Control and Optimization · 2024-09-20
articleSenior author
Recent grants
NSF · $503k · 2023–2028
NSF · $361k · 2020–2024
NSF · $430k · 2020–2024
Frequent coauthors
- 34 shared
Alejandro Ribeiro
California University of Pennsylvania
- 23 shared
Joshua S. Weitz
University of Maryland, College Park
- 21 shared
Keith Paarporn
- 19 shared
Armita Nourmohammad
- 17 shared
Sarper Aydın
- 15 shared
Ali Jadbabaie
- 14 shared
Pooya Molavi
Northwestern University
- 12 shared
Furkan Sezer
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ceyhun Eksin
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup