
Shancong Mou
VerifiedUniversity of Minnesota · Industrial and Systems Engineering
Active 2017–2026
About
Shancong Mou is an Assistant Professor in the Department of Industrial and Systems Engineering at the University of Minnesota, Twin Cities. He earned his Ph.D. in Industrial Engineering from the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Tech in 2024, where he was advised by Prof. Jianjun Shi. He also holds an M.S. in Computational Science and Engineering from Georgia Tech and a B.Eng. in Energy and Power Engineering from Xi'an Jiaotong University, China. His research focuses on mathematical optimization, statistical machine learning, and their applications. He has received several awards, including best paper awards from the QCRE Division of IISE and the QSR Section of INFORMS, as well as teaching recognition as the Outstanding Graduate Student Instructor of the Year at Georgia Tech. His scholarly contributions are recognized through various research awards and scholarships, and he is actively accepting new undergraduate and graduate research students.
Research topics
- Computer Science
- Artificial Intelligence
- Data Mining
- Machine Learning
- Mathematics
- Mathematical optimization
- Structural engineering
- Control engineering
- Algorithm
- Electronic engineering
- Engineering
Selected publications
Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation
Open MIND · 2026-02-11
preprintSenior authorIn this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian inverse--we exploit the statistical structure of the inner optimization problem and use the empirical Fisher information matrix as an asymptotically consistent surrogate for the Hessian. This design enables a parallel optimize-and-approximate framework in which the Hessian-inverse approximation is updated synchronously with the stochastic inner optimization, reusing gradient information at negligible additional cost. Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD that match those of state-of-the-art optimize-then-approximate methods, while significantly reducing computational time overhead. Empirical evaluations on representative bilevel learning tasks further demonstrate the practical advantages of NHGD, highlighting its scalability and effectiveness in large-scale machine learning settings.
Natural Hypergradient Descent: Algorithm Design, Convergence Analysis, and Parallel Implementation
ArXiv.org · 2026-02-11
articleOpen accessSenior authorIn this work, we propose Natural Hypergradient Descent (NHGD), a new method for solving bilevel optimization problems. To address the computational bottleneck in hypergradient estimation--namely, the need to compute or approximate Hessian inverse--we exploit the statistical structure of the inner optimization problem and use the empirical Fisher information matrix as an asymptotically consistent surrogate for the Hessian. This design enables a parallel optimize-and-approximate framework in which the Hessian-inverse approximation is updated synchronously with the stochastic inner optimization, reusing gradient information at negligible additional cost. Our main theoretical contribution establishes high-probability error bounds and sample complexity guarantees for NHGD that match those of state-of-the-art optimize-then-approximate methods, while significantly reducing computational time overhead. Empirical evaluations on representative bilevel learning tasks further demonstrate the practical advantages of NHGD, highlighting its scalability and effectiveness in large-scale machine learning settings.
Uni-3DAD: Gan-inversion aided universal 3D anomaly detection on model-free products
Expert Systems with Applications · 2025-01-29 · 6 citations
articleSynth4Seg—Learning Defect Data Synthesis for Defect Segmentation Using Bi-Level Optimization
IEEE Transactions on Automation Science and Engineering · 2025-01-01 · 1 citations
article1st authorCorrespondingDefect segmentation is crucial for quality control in advanced manufacturing, yet data scarcity poses challenges for state-of-the-art supervised deep learning. Synthetic defect data generation is a popular approach for mitigating data challenges. However, many current methods simply generate defects following a fixed set of rules, which may not directly relate to downstream task performance. This can lead to suboptimal performance and may even hinder the downstream task. To solve this problem, we leverage a novel bi-level optimization-based synthetic defect data generation framework. We use an online synthetic defect generation module grounded in the commonly-used Cut&Paste framework, and adopt an efficient gradient-based optimization algorithm to solve the bi-level optimization problem. We achieve simultaneous training of the defect segmentation network, and learn various parameters of the data synthesis module by maximizing the validation performance of the trained defect segmentation network. Our experimental results on benchmark datasets under limited data settings show that the proposed bi-level optimization method can be used for learning the most effective locations for pasting synthetic defects thereby improving the segmentation performance by up to 21.3% when compared to pasting defects at random locations. We also demonstrate up to 2.7% performance gain by learning the importance weights for different augmentation-specific defect data sources when compared to giving equal importance to all the data sources.
Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products
arXiv (Cornell University) · 2024-08-29 · 1 citations
preprintOpen accessAnomaly detection is a long-standing challenge in manufacturing systems. Traditionally, anomaly detection has relied on human inspectors. However, 3D point clouds have gained attention due to their robustness to environmental factors and their ability to represent geometric data. Existing 3D anomaly detection methods generally fall into two categories. One compares scanned 3D point clouds with design files, assuming these files are always available. However, such assumptions are often violated in many real-world applications where model-free products exist, such as fresh produce (i.e., ``Cookie", ``Potato", etc.), dentures, bone, etc. The other category compares patches of scanned 3D point clouds with a library of normal patches named memory bank. However, those methods usually fail to detect incomplete shapes, which is a fairly common defect type (i.e., missing pieces of different products). The main challenge is that missing areas in 3D point clouds represent the absence of scanned points. This makes it infeasible to compare the missing region with existing point cloud patches in the memory bank. To address these two challenges, we proposed a unified, unsupervised 3D anomaly detection framework capable of identifying all types of defects on model-free products. Our method integrates two detection modules: a feature-based detection module and a reconstruction-based detection module. Feature-based detection covers geometric defects, such as dents, holes, and cracks, while the reconstruction-based method detects missing regions. Additionally, we employ a One-class Support Vector Machine (OCSVM) to fuse the detection results from both modules. The results demonstrate that (1) our proposed method outperforms the state-of-the-art methods in identifying incomplete shapes and (2) it still maintains comparable performance with the SOTA methods in detecting all other types of anomalies.
Heavy-traffic queue length behavior in a switch under Markovian arrivals
Advances in Applied Probability · 2024-03-01
articleOpen access1st authorCorrespondingAbstract This paper studies the input-queued switch operating under the MaxWeight algorithm when the arrivals are according to a Markovian process. We exactly characterize the heavy-traffic scaled mean sum queue length in the heavy-traffic limit, and show that it is within a factor of less than 2 from a universal lower bound. Moreover, we obtain lower and upper bounds that are applicable in all traffic regimes and become tight in the heavy-traffic regime. We obtain these results by generalizing the drift method recently developed for the case of independent and identically distributed arrivals to the case of Markovian arrivals. We illustrate this generalization by first obtaining the heavy-traffic mean queue length and its distribution in a single-server queue under Markovian arrivals and then applying it to the case of an input-queued switch. The key idea is to exploit the geometric mixing of finite-state Markov chains, and to work with a time horizon that is chosen so that the error due to mixing depends on the heavy-traffic parameter.
IISE Transactions · 2024-09-27 · 2 citations
articleTechnometrics · 2024-03-26 · 5 citations
articleIn recent years, diversified measurements reflect the system dynamics from a more comprehensive perspective in system modeling and analysis, such as scalars, waveform signals, images, and structured point clouds. To handle such multimodal structured high-dimensional (SHD) data, combining a large amount of data from multiple sites is necessary (i) to reduce the inherent population bias from a single site and (ii) to increase the model accuracy. However, impeded by data management policies and storage costs, data could not be easily shared or directly exchanged among different sites. Instead of simplifying or facilitating the data query process, we propose a federated multiple tensor-on-tensor regression (FedMTOT) framework to train the individual system model locally using (i) its own data and (ii) data features (not data itself) from other sites. Specifically, federated computation is executed based on alternating direction method of multipliers (ADMM) to satisfy data-sharing requirements, while the individual model at each site can still benefit from feature knowledge from other sites to improve its own model accuracy. Finally, two simulations and two case studies validate the superiority of the proposed FedMTOT framework.
Synergy of Engineering and Statistics: Multimodal Data Fusion for Quality Improvement
Springer optimization and its applications · 2024-01-01
book-chapterSenior authorSynth4Seg -- Learning Defect Data Synthesis for Defect Segmentation using Bi-level Optimization
arXiv (Cornell University) · 2024-10-24
preprintOpen access1st authorCorrespondingDefect segmentation is crucial for quality control in advanced manufacturing, yet data scarcity poses challenges for state-of-the-art supervised deep learning. Synthetic defect data generation is a popular approach for mitigating data challenges. However, many current methods simply generate defects following a fixed set of rules, which may not directly relate to downstream task performance. This can lead to suboptimal performance and may even hinder the downstream task. To solve this problem, we leverage a novel bi-level optimization-based synthetic defect data generation framework. We use an online synthetic defect generation module grounded in the commonly-used Cut\&Paste framework, and adopt an efficient gradient-based optimization algorithm to solve the bi-level optimization problem. We achieve simultaneous training of the defect segmentation network, and learn various parameters of the data synthesis module by maximizing the validation performance of the trained defect segmentation network. Our experimental results on benchmark datasets under limited data settings show that the proposed bi-level optimization method can be used for learning the most effective locations for pasting synthetic defects thereby improving the segmentation performance by up to 18.3\% when compared to pasting defects at random locations. We also demonstrate up to 2.6\% performance gain by learning the importance weights for different augmentation-specific defect data sources when compared to giving equal importance to all the data sources.
Frequent coauthors
- 24 shared
Jianjun Shi
Fudan University
- 8 shared
Chuck Zhang
- 7 shared
Siva Theja Maguluri
Georgia Institute of Technology
- 7 shared
Meng Cao
Xi'an University of Architecture and Technology
- 7 shared
Jiulong Shan
- 5 shared
Zaiwei Chen
- 5 shared
Haoping Bai
- 4 shared
Ping Huang
Xi'an Jiaotong University
Awards & honors
- Best Track Paper Award (Winner), QCRE Division of IISE (2023…
- Best Student Paper Award (Winner), QSR Section of INFORMS (2…
- Best Track Paper Award (Finalist), DAIS Division of IISE (20…
- Best Student Paper Award (Finalist), DAIS Division of IISE (…
- Best Student Paper Award (Finalist), DAIS Division of IISE (…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Shancong Mou
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup