Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Mochen Yang

Mochen Yang

· Associate ProfessorVerified

University of Minnesota · Information and Decision Sciences

Active 2013–2026

h-index6
Citations389
Papers5634 last 5y
Funding
See your match with Mochen Yang — sign in to PhdFit.Sign in

About

I study algorithmic decision-making from both a “make” perspective and a “use” perspective. From the “make” perspective, I design theoretically robust and computationally efficient algorithms to support decision-making in information-intensive environment. From the “use” perspective, I examine the antecedents of algorithmic decision-making as well as its impact on decision quality, fairness, and privacy. My aspiration is to study real problems and develop practical solutions to support individual and organizational decision-making.

Research topics

  • Business
  • Computer Science
  • Sociology
  • Artificial Intelligence
  • Economics
  • Psychology
  • Monetary economics
  • Risk analysis (engineering)
  • Epistemology
  • Finance
  • Art
  • Mathematics
  • Anthropology
  • Philosophy
  • Management science
  • Data science
  • Knowledge management
  • Cognitive science

Selected publications

  • Regurgitative Training: The Value of Real Data in Training Large Language Models

    Management Science · 2026-05-11 · 3 citations

    preprintOpen access

    What happens if we train a new large language model (LLM) using data at least partially generated by other LLMs? The explosive success of LLMs means that content online will increasingly be generated by LLMs rather than humans, which inevitably enters the training data sets of next-generation LLMs. In this paper, we study the implications of such “regurgitative training” on LLM performance. Starting with the machine translation task (a representative language task with well-established evaluation criteria), we fine-tune LLMs with data generated either by themselves or by other LLMs, and we find strong evidence that regurgitative training handicaps the performance of fine-tuned LLMs. A comparison between LLM-generated data and real data reveals suggestive evidence that higher error rates and lower lexical diversity in LLM-generated data may be at play. Accordingly, we propose and evaluate three strategies to mitigate the performance loss by (i) prioritizing high-quality LLM-generated data, (ii) mixing data generated by multiple LLMs, and (iii) prioritizing LLM-generated data that most resemble real data. All three strategies can improve the performance of regurgitative training to some extent but cannot fully close the gap from training with real data. This highlights that real, human-generated data cannot be easily substituted by LLM-generated data in training LLMs. Additionally, we investigate regurgitative training on a creative ideation task with human judgement-based evaluations. Interestingly, we find that preference-based fine-tuning with human feedback on LLM-generated ideas can actually improve ideation performance. This showcases that human preference data when combined with LLM-generated data can bring performance gains. This paper was accepted by Hemant Bhargava, information systems. Funding: This work was supported by the National Natural Science Foundation of China [Grants 72421001 and 72172070] and the Singapore Ministry of Education Academic Research Fund Tier 2 A-8003504 [Robert Brown Promising Researcher Award MOE-T2EP40]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2024.07005 .

  • Agentic AI and Structured vs. Self-guided learning

    AEA Randomized Controlled Trials · 2025-07-23

    dataset
  • Agentic AI and Structured vs. Self-guided learning

    AEA Randomized Controlled Trials · 2025-07-23

    dataset
  • Robustness is Important: Limitations of LLMs for Data Fitting

    ArXiv.org · 2025-08-27

    preprintOpen access

    Large Language Models (LLMs) are being applied in a wide array of settings, well beyond the typical language-oriented use cases. In particular, LLMs are increasingly used as a plug-and-play method for fitting data and generating predictions. Prior work has shown that LLMs, via in-context learning or supervised fine-tuning, can perform competitively with many tabular supervised learning techniques in terms of predictive performance. However, we identify a critical vulnerability of using LLMs for data fitting -- making changes to data representation that are completely irrelevant to the underlying learning task can drastically alter LLMs' predictions on the same data. For example, simply changing variable names can sway the size of prediction error by as much as 82% in certain settings. Such prediction sensitivity with respect to task-irrelevant variations manifests under both in-context learning and supervised fine-tuning, for both close-weight and open-weight general-purpose LLMs. Moreover, by examining the attention scores of an open-weight LLM, we discover a non-uniform attention pattern: training examples and variable names/values which happen to occupy certain positions in the prompt receive more attention when output tokens are generated, even though different positions are expected to receive roughly the same attention. This partially explains the sensitivity in the presence of task-irrelevant variations. We also consider a state-of-the-art tabular foundation model (TabPFN) trained specifically for data fitting. Despite being explicitly designed to achieve prediction robustness, TabPFN is still not immune to task-irrelevant variations. Overall, despite LLMs' impressive predictive capabilities, currently they lack even the basic level of robustness to be used as a principled data-fitting tool.

  • What, Why, and How: An Empiricist’s Guide to Double/Debiased Machine Learning

    Information Systems Research · 2025-12-05 · 2 citations

    article

    We provide an introduction to double/debiased machine learning (DML), a framework that enables effect estimation when dealing with complex, high-dimensional data. In many empirical analyses, especially in fields such as information systems, researchers face difficult choices about which control variables to include and how to model their relationships with the outcome. These modeling decisions can significantly change results, leading to uncertainty about which findings are reliable. DML offers a practical solution by combining modern machine learning with rigorous statistical inference. The idea is to let flexible ML models (such as random forests or gradient boosting) capture complex relationships among control variables while still delivering reliable estimates for the key effect of interest. DML can be applied to many familiar research designs, including standard regression with controls, instrumental variables, difference in differences, and models that incorporate ML-generated features. Empirical studies and simulations show that DML is typically more robust to misspecification than traditional regression and more reliable than earlier semiparametric methods. However, DML is not automatic—it still requires sound research design and high-quality machine learning estimation. Used thoughtfully, DML provides a powerful, flexible, and statistically grounded approach for empirical research in modern data environments.

  • Agentic AI and Managers' Analytics Capabilities: An Exploration

    SSRN Electronic Journal · 2025-01-01

    articleOpen accessSenior author
  • Improving Convergence of Flexible Combinatorial Auctions with Rationality-Based Ask Prices

    SSRN Electronic Journal · 2025-01-01

    preprintOpen accessSenior author
  • Cost-Aware Calibration of Classifiers

    INFORMS Journal on Data Science · 2024-12-09

    article1st authorCorresponding

    Most classification techniques in machine learning are able to produce probability predictions in addition to class predictions. However, these predicted probabilities are often not well calibrated in that they deviate from the actual outcome rates (i.e., the proportion of data instances that actually belong to a certain class). A lack of calibration can jeopardize downstream decision tasks that rely on accurate probability predictions. Although several post hoc calibration methods have been proposed, they generally do not consider the potentially asymmetric costs associated with overprediction versus underprediction. In this research, we formally define the problem of cost-aware calibration and propose a metric to quantify the cost of miscalibration for a given classifier. Next, we propose three approaches to achieve cost-aware calibration, two of which are cost-aware adaptations of existing calibration algorithms; the third one (named MetaCal) is a Bayes optimal learning algorithm inspired by prior work on cost-aware classification. We carry out systematic empirical evaluations on multiple public data sets to demonstrate the effectiveness of the proposed approaches in reducing the cost of miscalibration. Finally, we generalize the definition and metric as well as solution algorithms of cost-aware calibration to account for nonlinear cost structures that may arise in real-world decision tasks. History: David Martens served as the senior editor for this article. Data Ethics & Reproducibility Note: There are no data ethics considerations. The code capsule is available on Code Ocean at https://doi.org/10.24433/CO.8552538.v1 and in the e-Companion to this article (available at https://doi.org/10.1287/ijds.2024.0038 ).

  • Engaging Users on Social Media Business Pages: The Roles of User Comments and Firm Responses

    MIS Quarterly · 2024-06-01 · 11 citations

    articleSenior author

    Firms must strategically manage their responses to user comments to keep users engaged on their social media business pages. The question of whether, how, and when a firm should respond to user comments to achieve favorable outcomes is of great interest to researchers and practitioners. We focus on these questions and study the effects of initial user comments and firm responses on subsequent user engagement on social media business pages. In particular, we theorize and examine how two features of initial user comments (i.e., sentiment and controversialness) and two features of firm responses (i.e., uniqueness and timeliness) jointly affect the volume and sentiment of subsequent user comments. By analyzing data from the Facebook business pages of multiple U.S. retailers (10,312 firm posts from 37 firms and over 1 million user comments), we found that firms are more likely to respond to negative comments (than positive or neutral comments) but less likely to respond to controversial comments (which evoke diverse opinions and emotions). Further, we found that engaging with negative and controversial comments and promptly responding to comments are linked to an increase in the volume of subsequent user comments but also to a more negative sentiment in these comments. We also found that providing unique responses improves the volume and sentiment of subsequent user comments. Our findings offer theoretical and practical insights into firms’ response management on social media.

  • What, Why, and How: An Empiricist's Guide to Double/Debiased Machine Learning

    SSRN Electronic Journal · 2024-01-01

    articleOpen access

Frequent coauthors

Education

  • B.S., Information Systems and Information Management

    Tsinghua University

  • Ph.D., Information and Decision Sciences

    Carlson School of Management, University of Minnesota

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Mochen Yang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup