Xuan Bi

· Professor of Supply Chain & OperationsVerified

University of Minnesota · Supply Chain and Operations Management

Active 2007–2026

h-index10

Citations412

Papers6437 last 5y

Funding—

Faculty page Website

See your match with Xuan Bi — sign in to PhdFit.Sign in

About

My work centers on designing and applying statistical machine learning methods and AI technologies to address large-scale, real-world business and scientific problems. Specifically, my research interests lie broadly in trustworthy machine learning and AI, with a particular focus on data privacy, watermarking, retrieval-augmented generation, adversarial learning, and recommender systems.

Research topics

Machine Learning
Computer Science
Mathematics
Artificial Intelligence
Data Mining
Theoretical computer science
Pure mathematics
Operations research
Programming language
Data science
Econometrics
Economics

Selected publications

Privacy Meets Retrieval: The Performance Consequences of De-Identification in Retrieval-Augmented Generation
SSRN Electronic Journal · 2026-01-01
preprintOpen access
Publisher DOI
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
arXiv (Cornell University) · 2026-03-19
preprintOpen access
Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform near or below the median of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .
Publisher DOI
Differentially Private Truncation of Unbounded Data via Public Second Moments
ArXiv.org · 2026-02-25
articleOpen access
Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
Publisher OA PDF
Episode Charges and Subsequent Visits After Telemedicine vs In-Person Care
JAMA Network Open · 2026-02-09
articleOpen access
Importance: Telemedicine use increased during the COVID-19 pandemic and has remained a regular component of health care delivery. However, the financial implications of this change for health systems' reimbursement and utilization remain unclear. Objective: To compare 30-day episode charges and subsequent visits after telemedicine and in-person index visits. Design, Setting, and Participants: The target trial emulation conducted in this comparative effectiveness research included ambulatory in-person and telemedicine visit data from an academic health system comprising 5 hospitals in Pennsylvania from January 1 to April 30, 2024. Analyses focused on 10 high-volume clinical conditions commonly managed through telemedicine. Exposures: Telemedicine visits vs in-person visits. Main Outcomes and Measures: Outcomes included episode charges (the billed amount submitted for reimbursement to insurers and patients, excluding physician professional and facility fees for the index encounter) in an episode window from 7 days before to 30 days after the index visit and the number of subsequent visits within the episode window. Linear regression and Poisson regression with propensity score matching were conducted to adjust for demographic, clinical, socioeconomic, and contextual factors. Results: A total of 163 308 visits (108 383 [66.4%] among females; mean [SD] patient age, 49.2 [19.1] years) were included in this study. After propensity score matching, the mean 30-day episode charge was $96.60 (95% CI, $92.24-$100.96) for telemedicine encounters and $509.21 (95% CI, $500.65-$517.77) for in-person encounters (mean difference, $412.62; 95% CI, $403.01-$422.22). Additionally, telemedicine visits were associated with fewer follow-up visits per 30-day episode than were in-person visits (mean [SD], 3.44 [5.38] vs 4.44 [7.41] visits; comparative reduction, 23% [95% CI, 20%-26%]). For mental and behavioral disorders, 3 categories-depressive disorders (-$69.47; 95% CI, -$100.90 to -$38.04), anxiety and fear-related disorders ($38.06; 95% CI, $23.14 to $52.99), and neurodevelopmental disorders (-$28.88; 95% CI, -$54.72 to -$3.04)-exhibited comparable episode charges for telemedicine vs in-person encounters. Conclusions and Relevance: In this comparative effectiveness research using target trial emulation of outpatient telemedicine and in-person visits, telemedicine visits overall were associated with lower charges and fewer subsequent visits within the 30-day episode than were in-person visits. For mental and behavioral conditions, charges were comparable. These findings suggest that telemedicine may serve as a lower-charge alternative to in-person care without increasing the need for subsequent visits.
Publisher DOI
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science
ArXiv.org · 2026-03-19
articleOpen access
Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform near or below the median of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .
Publisher OA PDF
Leveraging Peripheral Behavioral Signals to Predict Sparse Monetization Decisions: The Parallel Sequence Network
SSRN Electronic Journal · 2026-01-01
preprintOpen access
Publisher DOI
Recommending Composite Items Using Multi-Level Preference Information: A Joint Interaction Modeling Approach
ArXiv.org · 2026-01-26
articleOpen access1st authorCorresponding
With the advancement of machine learning and artificial intelligence technologies, recommender systems have been increasingly used across a vast variety of platforms to efficiently and effectively match users with items. As application contexts become more diverse and complex, there is a growing need for more sophisticated recommendation techniques. One example is the composite item (for example, fashion outfit) recommendation where multiple levels of user preference information might be available and relevant. In this study, we propose JIMA, a joint interaction modeling approach that uses a single model to take advantage of all data from different levels of granularity and incorporate interactions to learn the complex relationships among lower-order (atomic item) and higher-order (composite item) user preferences as well as domain expertise (e.g., on the stylistic fit). We comprehensively evaluate the proposed method and compare it with advanced baselines through multiple simulation studies as well as with real data in both offline and online settings. The results consistently demonstrate the superior performance of the proposed approach.
Publisher OA PDF
Recommending Composite Items Using Multi-Level Preference Information: A Joint Interaction Modeling Approach
Open MIND · 2026-01-26
preprint1st authorCorresponding
With the advancement of machine learning and artificial intelligence technologies, recommender systems have been increasingly used across a vast variety of platforms to efficiently and effectively match users with items. As application contexts become more diverse and complex, there is a growing need for more sophisticated recommendation techniques. One example is the composite item (for example, fashion outfit) recommendation where multiple levels of user preference information might be available and relevant. In this study, we propose JIMA, a joint interaction modeling approach that uses a single model to take advantage of all data from different levels of granularity and incorporate interactions to learn the complex relationships among lower-order (atomic item) and higher-order (composite item) user preferences as well as domain expertise (e.g., on the stylistic fit). We comprehensively evaluate the proposed method and compare it with advanced baselines through multiple simulation studies as well as with real data in both offline and online settings. The results consistently demonstrate the superior performance of the proposed approach.
DOI
Differentially Private Truncation of Unbounded Data via Public Second Moments
Open MIND · 2026-02-25
preprint
Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
DOI
Physician Use of Large Language Models: A Quantitative Study Based on Large-Scale Query-Level Data
Journal of Medical Internet Research · 2025-08-25 · 4 citations
articleOpen access
Background: Generative artificial intelligence (GenAI) has rapidly emerged as a promising tool in health care. Despite its growing adoption, how physicians make use of it in medical practice has not been qualitatively studied. Existing literature has largely focused on theoretical applications or experimental validations, with limited insight into real-world physician engagement with GenAI technologies. Objective: The aim of this study was to leverage a fine-grained dataset at the query level to quantitatively examine how physicians incorporate GenAI into their clinical and research workflows. The primary objective was to analyze usage patterns over time and across physician demographics. A secondary goal was to assess potential risks to patient privacy arising from physicians' interactions with GenAI platforms. Methods: This study collected 106,942 query-and-answer pairs by 989 physicians between August 29, 2023, and April 16, 2024. We performed topic classification to identify the most prevalent use cases, examining how these use cases evolved over time and across demographics. We also developed sensitivity classifiers to detect personally identifiable information in physicians' queries to explore the potential privacy breach risks around physicians' use of GenAI. Results: Approximately 40% (396/989) of the enrolled physicians were female, 45.9% (454/989) were younger than 25 years, and 54.1% (535/989) were between 25 and 56 years of age. The majority of them worked in clinical departments (680/989, 68.8%) or medical technology departments (127/989, 12.8%). Our classification-based quantitative analyses suggest the following. First, physicians use GenAI predominantly for medical research (64,379/106,942, 60.2%) rather than clinical practice (13,100/106,942, 12.25%). Second, physicians focus more on health care-related questions (rising from 64,165/106,942, 60% to 83,415/106,942, 78%) within the first 15% (16,041/106,942) of their query sequence. Third, the use of GenAI differed across physician demographics and features. Specifically, female physicians asked a larger proportion of clinical questions (female: 0.154 vs male: 0.108; P<.001) and administration questions (female: 0.027 vs male: 0.018; P<.001) than male physicians; younger physicians posed more clinical questions (age ≤25: 0.146 vs age ∈ (25, 40]: 0.115 vs age >40: 0.103; P<.001) but fewer research questions (age ≤25: 0.580 vs age ∈ (25, 40]: 0.607 vs age >40: 0.664; P<.001) than senior physicians; and physicians accessing GenAI via computers asked more research questions (computer: 0.637 vs mobile: 0.296; P<.001), whereas physicians using mobile devices asked more clinical questions (computer: 0.107 vs mobile: 0.264; P<.001). Fourth, only 2.68% (2866/106,942) of physician queries contained sensitive information, the majority of which were primarily derived from writing and editing. Conclusions: Physicians are actively integrating GenAI into their professional routines, primarily leveraging it for research but also increasingly for clinical support. Usage patterns vary significantly across demographic lines, including gender, age, and device preference. Despite the presence of sensitive information in some queries, the risk of privacy breaches appears to be low.
Publisher OA PDF DOI

Frequent coauthors

Annie Qu
University of California, Irvine
29 shared
Mochen Yang
Twin Cities Orthopedics
27 shared
Gediminas Adomavičius
17 shared
Heping Zhang
Public Health Department
14 shared
Long Feng
12 shared
Xiwei Tang
Hunan First Normal University
12 shared
Xiaotong Shen
Beijing Normal University
9 shared
Yi Zhu
Shanghai University of Finance and Economics
6 shared

Awards & honors

Winner of ASA Student Paper Award (SLDS Section, 2016)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Xuan Bi

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you