
Thodoris Lykouris
· Assistant Professor, Operations ManagementVerifiedMassachusetts Institute of Technology · Operations Management
Active 2014–2026
About
Thodoris Lykouris is an Assistant Professor of Operations Management at the MIT Sloan School of Management. His research focuses on data-driven sequential decision-making, spanning across machine learning, dynamic optimization, and economics. Prior to his current position, he was a postdoctoral researcher at Microsoft Research NYC, where he was part of the machine learning group. His dissertation was selected as a finalist in the George B. Dantzig Dissertation Award competition, and his papers have been finalists in several prestigious competitions, including the INFORMS Junior Faculty Interest Group paper competition and the INFORMS Nicholson and Applied Probability Society best student paper competitions. Thodoris holds a diploma in electrical and computer engineering from the National Technical University of Athens and a PhD in computer science from Cornell University, where he was advised by Éva Tardos.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Social psychology
- Psychology
- Art
- Mathematics
- Mathematical economics
- Cognitive psychology
Selected publications
Optimal Exploration of New Products under Assortment Decisions
ArXiv.org · 2026-04-20
articleOpen accessSenior authorWe study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.
Human-AI Productivity Paradoxes: Modeling the Interplay of Skill, Effort, and AI Assistance
arXiv (Cornell University) · 2026-05-12
preprintOpen accessGenerative Artificial Intelligence (AI) tools are rapidly adopted in the workplace and in education, yet the empirical evidence on AI's impact remains mixed. We propose a model of human-AI interaction to better understand and analyze several mechanisms by which AI affects productivity. In our setup, human agents with varying skill levels exert utility-maximizing effort to produce certain task outcomes with AI assistance. We find that incorporating either endogeneity in skill development or in AI unreliability can induce a productivity paradox: increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls. Moreover, we examine the long-term distributional effect of AI on skill, and demonstrate that skill polarization can emerge in steady state when accounting for heterogeneity in AI literacy -- the agent's capability to identify and adapt to inaccurate AI outputs. Our results elucidate several mechanisms that may explain the emergence of human-AI productivity paradoxes and skill polarization, and identify simple measures that characterize when they arise.
Human-AI Productivity Paradoxes: Modeling the Interplay of Skill, Effort, and AI Assistance
ArXiv.org · 2026-05-12
articleOpen accessGenerative Artificial Intelligence (AI) tools are rapidly adopted in the workplace and in education, yet the empirical evidence on AI's impact remains mixed. We propose a model of human-AI interaction to better understand and analyze several mechanisms by which AI affects productivity. In our setup, human agents with varying skill levels exert utility-maximizing effort to produce certain task outcomes with AI assistance. We find that incorporating either endogeneity in skill development or in AI unreliability can induce a productivity paradox: increased levels of AI assistance may degrade productivity, leading to potentially significant shortfalls. Moreover, we examine the long-term distributional effect of AI on skill, and demonstrate that skill polarization can emerge in steady state when accounting for heterogeneity in AI literacy -- the agent's capability to identify and adapt to inaccurate AI outputs. Our results elucidate several mechanisms that may explain the emergence of human-AI productivity paradoxes and skill polarization, and identify simple measures that characterize when they arise.
Optimal Exploration of New Products under Assortment Decisions
arXiv (Cornell University) · 2026-04-20
preprintOpen accessSenior authorWe study online learning for new products on a platform that makes capacity-constrained assortment decisions on which products to offer. For a newly listed product, its quality is initially unknown, and quality information propagates through social learning: when a customer purchases a new product and leaves a review, its quality is revealed to both the platform and future customers. Since reviews require purchases, the platform must feature new products in the assortment ("explore") to generate reviews to learn about new products. Such exploration is costly because customer demand for new products is lower than for incumbent products. We characterize the optimal assortments for exploration to minimize regret, addressing two questions. (1) Should the platform offer a new product alone or alongside incumbent products? The former maximizes the purchase probability of the new product but yields lower short-term revenue. Despite the lower purchase probability, we show it is always optimal to pair the new product with the top incumbent products. (2) With multiple new products, should the platform explore them simultaneously or one at a time? We show that the optimal number of new products to explore simultaneously has a simple threshold structure: it increases with the "potential" of the new products and, surprisingly, does not depend on their individual purchase probabilities. We also show that two canonical bandit algorithms, UCB and Thompson Sampling, both fail in this setting for opposite reasons: UCB over-explores while Thompson Sampling under-explores. Our results provide structural insights on how platforms should learn about new products through assortment decisions.
Contextual Dynamic Pricing with Heterogeneous Buyers
ArXiv.org · 2025-12-10
preprintOpen access1st authorCorrespondingWe initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly posts prices (over $T$ rounds) that depend on the observable $d$-dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support size $K_{\star}$. We develop a contextual pricing algorithm based on optimistic posterior sampling with regret $\widetilde{O}(K_{\star}\sqrt{dT})$, which we prove to be tight in $d$ and $T$ up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware zooming algorithm that achieves the optimal dependence on $K_{\star}$.
Scheduling with Uncertain Holding Costs and its Application to Content Moderation
ArXiv.org · 2025-05-27
preprintOpen accessIn content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain holding costs, the two canonical algorithmic principles, instantaneous-cost ($cμ$-rule) and expected-remaining-cost ($cμ/θ$-rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the regret of OaRC scales as $\tilde{O}(L^{1.5}\sqrt{N})$, where $L$ is the maximum length of a job's holding cost trajectory and $N$ is the system size. This regret bound shows that OaRC achieves asymptotic optimality when the system size $N$ scales to infinity. Moreover, its regret is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.
Corruption-Robust Exploration in Episodic Reinforcement Learning
Mathematics of Operations Research · 2024 · 38 citations
1st authorCorresponding- Computer Science
- Artificial Intelligence
- Cognitive psychology
We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning. Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .
Social Learning with Bounded Rationality: Negative Reviews Persist under Newest First
2024-07-08
articleOpen accessSenior authorThe use of product reviews in online platforms is ubiquitous and it is well established that reviews play a significant role on customer purchase decisions. The process in which reviews impact product purchases can be seen as a problem of social learning, which generically studies how agents update their beliefs for an unknown quantity of interest (e.g., product quality) based on observing actions of past agents (e.g., reading reviews by past customers). The typical assumption in the literature of social learning with reviews is that, when deciding whether to purchase a product, customers consider either all reviews provided by previous customers or a summary statistic such as their average rating. However, in practice, a common scenario may be somewhere "in between" the above two assumptions: customers read a small number of reviews in detail.
Learning to Defer in Congested Systems: The AI-Human Interplay
arXiv (Cornell University) · 2024-02-19 · 4 citations
preprintOpen access1st authorCorrespondingHigh-stakes applications rely on combining Artificial Intelligence (AI) and humans for responsive and reliable decision making. For example, content moderation in social media platforms often employs an AI-human pipeline to promptly remove policy violations without jeopardizing legitimate content. A typical heuristic estimates the risk of incoming content and uses fixed thresholds to decide whether to auto-delete the content (classification) and whether to send it for human review (admission). This approach can be inefficient as it disregards the uncertainty in AI's estimation, the time-varying element of content arrivals and human review capacity, and the selective sampling in the online dataset (humans only review content filtered by the AI). In this paper, we introduce a model to capture such an AI-human interplay. In this model, the AI observes contextual information for incoming jobs, makes classification and admission decisions, and schedules admitted jobs for human review. During these reviews, humans observe a job's true cost and may overturn an erroneous AI classification decision. These reviews also serve as new data to train the AI but are delayed due to congestion in the human review system. The objective is to minimize the costs of eventually misclassified jobs. We propose a near-optimal learning algorithm that carefully balances the classification loss from a selectively sampled dataset, the idiosyncratic loss of non-reviewed jobs, and the delay loss of having congestion in the human review system. To the best of our knowledge, this is the first result for online learning in contextual queueing systems. Moreover, numerical experiments based on online comment datasets show that our algorithm can substantially reduce the number of misclassifications compared to existing content moderation practice.
Social Learning with Limited Attention: Negative Reviews Persist under Newest First
arXiv (Cornell University) · 2024-06-11 · 1 citations
preprintOpen accessSenior authorWe study a model of social learning from reviews where customers are computationally limited and make purchases based on reading only the first few reviews displayed by the platform. Under this limited attention, we establish that the review ordering policy can have a significant impact. In particular, the popular Newest First ordering induces a negative review to persist as the most recent review longer than a positive review. This phenomenon, which we term the Cost of Newest First, can make the long-term revenue unboundedly lower than a counterpart where reviews are exogenously drawn for each customer. We show that the impact of the Cost of Newest First can be mitigated under dynamic pricing, which allows the price to depend on the set of displayed reviews. Under the optimal dynamic pricing policy, the revenue loss is at most a factor of 2. On the way, we identify a structural property for this optimal dynamic pricing: the prices should ensure that the probability of a purchase is always the same, regardless of the state of reviews. We also consider a setting where product quality evolves over time according to a Markov chain; we find that Newest First better tracks current quality but still leads to lower revenue, highlighting a trade-off between customer belief accuracy and revenue. Finally, we support our theoretical findings with numerical simulations and an empirical analysis on reviews from Tripadvisor.
Frequent coauthors
- 13 shared
Éva Tardos
- 11 shared
Daniel Freund
Massachusetts Institute of Technology
- 9 shared
Akshay Krishnamurthy
- 7 shared
Karthik Sridharan
- 7 shared
Robert E. Schapire
- 6 shared
Chara Podimata
- 6 shared
Siddhartha Banerjee
Cornell University
- 6 shared
Renato Paes Leme
Labs
MIT Sloan Operations ManagementPI
Awards & honors
- George B. Dantzig Dissertation Award finalist
- INFORMS Junior Faculty Interest Group paper competition fina…
- INFORMS Nicholson and Applied Probability Society best stude…
- Google PhD Fellowship
- Cornell University Fellowship
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Thodoris Lykouris
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup