Yash Kanoria

· Merrill Lynch Professor of Workforce TransformationVerified

Columbia University · Decision Sciences and Operations

Active 2013–2026

h-index18

Citations943

Papers7131 last 5y

Funding$500k

Faculty page Lab page

See your match with Yash Kanoria — sign in to PhdFit.Sign in

About

Yashodhan Kanoria, known as Yash Kanoria, is an Associate Professor of Business in the Decision, Risk and Operations division at Columbia Business School. His core research focuses on the design and optimization of marketplaces, including matching markets and online platforms. He teaches Business Analytics in the MBA core curriculum and has also taught several research-oriented PhD classes such as Engineering Online Matching Markets, Statistical Physics, Markets and Algorithms, and Frontiers in Online Optimization. Kanoria's work addresses fundamental problems in marketplace design, dynamic matching, and resource allocation, contributing to both theoretical and applied aspects of these fields. He has been recognized with several awards including the Sigecom Test of Time award, Amazon OR best paper award, IIT Bombay YAA award, NSF CAREER Award, and ISIT student paper award. Kanoria's research integrates insights from operations research, economics, and computer science to develop efficient algorithms and mechanisms for complex market environments.

Research topics

Computer Science
Economics
Microeconomics
Mathematics
Operations management
Computer Security
Business
Mathematical optimization
Operations research
Artificial Intelligence
Marketing
Industrial organization
Engineering

Selected publications

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
ArXiv.org · 2026-05-14
articleOpen accessSenior author
We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form, further broadening its applicability. Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows. Finally, we characterize the structure of the mixed gradient, showing that its cross term -- which captures how continuous actions influence future discrete decisions -- becomes negligible near a discrete best response, thereby enabling approximate decentralized updates of the continuous and discrete components and reducing variance near optimality. All resources are available at github.com/MatiasAlvo/hybrid-rl.
Publisher OA PDF
What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, & Emerging Implications of Agentic E-Commerce
2026-04-09
articleOpen access
Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, AI agents can parse webpages or interact through APIs to evaluate products, and transact. This raises a fundamental question: what do AI agents buy—and why? We develop ACES, a sandbox environment that pairs a platform-agnostic agent with a fully programmable mock marketplace to study this. We first explore aggregate choices, revealing that modal choices can differ across models, with AI agents sometimes concentrating on a few products, raising competition questions. We then analyze the current drivers of choices through randomized experiments on product positions and listing attributes. Models show sizeable and heterogeneous position effects: all favor the top row, yet different models prefer different columns, undermining the assumption of a universal ''top'' rank. They penalize sponsored tags, reward endorsements, and sensitivities to price, ratings, and reviews are directionally as expected, but vary sharply across models. Our findings reveal how AI agents behave in e-commerce, and surface concrete monitoring, seller strategy, platform design, and regulatory questions.
Publisher DOI
Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients
arXiv (Cornell University) · 2026-05-14
preprintOpen accessSenior author
We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form, further broadening its applicability. Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows. Finally, we characterize the structure of the mixed gradient, showing that its cross term -- which captures how continuous actions influence future discrete decisions -- becomes negligible near a discrete best response, thereby enabling approximate decentralized updates of the continuous and discrete components and reducing variance near optimality. All resources are available at github.com/MatiasAlvo/hybrid-rl.
Publisher DOI
What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, & Emerging Implications for Agentic E-Commerce
ArXiv.org · 2025-08-04
preprintOpen access
Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, AI agents can parse webpages or leverage APIs to view, evaluate and choose products. We investigate the behavior of AI agents using ACES, a provider-agnostic framework for auditing agent decision-making. We reveal that agents can exhibit choice homogeneity, often concentrating demand on a few ``modal'' products while ignoring others entirely. Yet, these preferences are unstable: model updates can drastically reshuffle market shares. Furthermore, randomized trials show that while agents have improved over time on simple tasks with a clearly identified best choice, they exhibit strong position biases -- varying across providers and model versions, and persisting even in text-only "headless" interfaces -- undermining any universal notion of a ``top'' rank. Agents also consistently penalize sponsored tags while rewarding platform endorsements, and sensitivities to price, ratings, and reviews vary sharply across models. Finally, we demonstrate that sellers can respond: a seller-side agent making simple, query-conditional description tweaks can drive significant gains in market share. These findings reveal that agentic markets are volatile and fundamentally different from human-centric commerce, highlighting the need for continuous auditing and raising questions for platform design, seller strategy and regulation.
Publisher OA PDF DOI
Impact of Rankings and Personalized Recommendations in Marketplaces
ArXiv.org · 2025-06-03
preprintOpen access
Individuals often navigate several options with incomplete knowledge of their own preferences. Information provisioning tools such as public rankings and personalized recommendations have become central to helping individuals make choices, yet their value proposition under different marketplace environments remains unexplored. This paper studies a stylized model to explore the impact of these tools in two marketplace settings: uncapacitated supply, where items can be selected by any number of agents, and capacitated supply, where each item is constrained to be matched to a single agent. We model the agents utility as a weighted combination of a common term which depends only on the item, reflecting the item's population level quality, and an idiosyncratic term, which depends on the agent item pair capturing individual specific tastes. Public rankings reveal the common term, while personalized recommendations reveal both terms. In the supply unconstrained settings, both public rankings and personalized recommendations improve welfare, with their relative value determined by the degree of preference heterogeneity. Public rankings are effective when preferences are relatively homogeneous, while personalized recommendations become critical as heterogeneity increases. In contrast, in supply constrained settings, revealing just the common term, as done by public rankings, provides limited benefit since the total common value available is limited by capacity constraints, whereas personalized recommendations, by revealing both common and idiosyncratic terms, significantly enhance welfare by enabling agents to match with items they idiosyncratically value highly. These results illustrate the interplay between supply constraints and preference heterogeneity in determining the effectiveness of information provisioning tools, offering insights for their design and deployment in diverse settings.
Publisher OA PDF DOI
Feature-Based Dynamic Matching
Operations Research · 2025-11-18
article
SOAR ing to Optimality: Smarter Matching Algorithms for On-Demand Platforms In “Feature-Based Dynamic Matching,” Y. Chen, Y. Kanoria, A. Kumar, and W. Zhang study dynamic two-sided matching where both customers and service providers are characterized by high-dimensional feature vectors, motivated by platforms like on-demand home services. A key finding is that myopic greedy policies—which match each customer to the best available provider—can be highly suboptimal. The authors introduce SOAR (simulate-optimize-assign-repeat), a forward-looking algorithm that balances immediate match quality against preserving valuable supply for future arrivals. Through a novel analytical framework connecting dynamic matching to empirical optimal transport, they prove SOAR achieves near-optimal regret scaling under various distributional assumptions, establishing fundamental performance limits for feature-based matching problems and as a by-product resolve an open problem from prior work on dynamic spatial matching.
Publisher DOI
What Is Your AI Agent Buying? Evaluation, Implications, and Emerging Questions for Agentic E-Commerce
SSRN Electronic Journal · 2025-01-01 · 3 citations
preprintOpen access
Publisher DOI
Impact of Rankings and Personalized Recommendations in Marketplaces
2025-07-02 · 1 citations
preprintOpen access
Decision-making often requires an individual to navigate a multitude of options with incomplete knowledge of their own preferences. Information provisioning tools such as public rankings and personalized recommendations have become central to helping individuals make choices, yet their value proposition under different marketplace environments remains unexplored. This paper studies a stylized model to explore the impact of these tools in two marketplace settings: uncapacitated supply, where items can be selected by any number of agents, and capacitated supply, where each item is constrained to be matched to a single agent. We model the agents utility as a weighted combination of a common term which depends only on the item, reflecting the item's population-level quality, and an idiosyncratic term, which depends on the agent-item pair capturing individual-specific preferences. Public rankings reveal the common term, while personalized recommendations reveal both terms.
Publisher OA PDF DOI
Network-Based Detection of Wash Trading
SSRN Electronic Journal · 2025-01-01
preprintOpen access
Publisher DOI
Dynamic spatial matching
The Annals of Applied Probability · 2025-10-01
article1st authorCorresponding
Publisher DOI

Recent grants

CAREER: Design of Matching Markets
NSF · $500k · 2017–2023

Frequent coauthors

Pengyu Qian
16 shared
Daniela Sabán
Palo Alto University
10 shared
Jay Sethuraman
9 shared
Ramesh Johari
7 shared
Akshit Kumar
Columbia University
7 shared
Itai Ashlagi
Stanford University
7 shared
Hamid Nazerzadeh
Uber AI (United States)
7 shared
Itai Feigenbaum
Lehman College
7 shared

Labs

Yash Kanoria LabPI
Design and optimization of marketplaces including matching markets and online platforms

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Yash Kanoria

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you