
Xuan Bi
· Professor of Supply Chain & OperationsVerifiedUniversity of Minnesota · Supply Chain and Operations Management
Active 2007–2026
About
My work centers on designing and applying statistical machine learning methods and AI technologies to address large-scale, real-world business and scientific problems. Specifically, my research interests lie broadly in trustworthy machine learning and AI, with a particular focus on data privacy, watermarking, retrieval-augmented generation, adversarial learning, and recommender systems.
Research topics
- Machine Learning
- Computer Science
- Mathematics
- Artificial Intelligence
- Data Mining
- Theoretical computer science
- Pure mathematics
- Operations research
- Programming language
- Data science
- Econometrics
- Economics
Selected publications
SSRN Electronic Journal · 2026-01-01
preprintOpen accessarXiv (Cornell University) · 2026-03-19
preprintOpen accessData science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform near or below the median of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .
Differentially Private Truncation of Unbounded Data via Public Second Moments
ArXiv.org · 2026-02-25
articleOpen accessData privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
Episode Charges and Subsequent Visits After Telemedicine vs In-Person Care
JAMA Network Open · 2026-02-09
articleOpen accessImportance: Telemedicine use increased during the COVID-19 pandemic and has remained a regular component of health care delivery. However, the financial implications of this change for health systems' reimbursement and utilization remain unclear. Objective: To compare 30-day episode charges and subsequent visits after telemedicine and in-person index visits. Design, Setting, and Participants: The target trial emulation conducted in this comparative effectiveness research included ambulatory in-person and telemedicine visit data from an academic health system comprising 5 hospitals in Pennsylvania from January 1 to April 30, 2024. Analyses focused on 10 high-volume clinical conditions commonly managed through telemedicine. Exposures: Telemedicine visits vs in-person visits. Main Outcomes and Measures: Outcomes included episode charges (the billed amount submitted for reimbursement to insurers and patients, excluding physician professional and facility fees for the index encounter) in an episode window from 7 days before to 30 days after the index visit and the number of subsequent visits within the episode window. Linear regression and Poisson regression with propensity score matching were conducted to adjust for demographic, clinical, socioeconomic, and contextual factors. Results: A total of 163 308 visits (108 383 [66.4%] among females; mean [SD] patient age, 49.2 [19.1] years) were included in this study. After propensity score matching, the mean 30-day episode charge was $96.60 (95% CI, $92.24-$100.96) for telemedicine encounters and $509.21 (95% CI, $500.65-$517.77) for in-person encounters (mean difference, $412.62; 95% CI, $403.01-$422.22). Additionally, telemedicine visits were associated with fewer follow-up visits per 30-day episode than were in-person visits (mean [SD], 3.44 [5.38] vs 4.44 [7.41] visits; comparative reduction, 23% [95% CI, 20%-26%]). For mental and behavioral disorders, 3 categories-depressive disorders (-$69.47; 95% CI, -$100.90 to -$38.04), anxiety and fear-related disorders ($38.06; 95% CI, $23.14 to $52.99), and neurodevelopmental disorders (-$28.88; 95% CI, -$54.72 to -$3.04)-exhibited comparable episode charges for telemedicine vs in-person encounters. Conclusions and Relevance: In this comparative effectiveness research using target trial emulation of outpatient telemedicine and in-person visits, telemedicine visits overall were associated with lower charges and fewer subsequent visits within the 30-day episode than were in-person visits. For mental and behavioral conditions, charges were comparable. These findings suggest that telemedicine may serve as a lower-charge alternative to in-person care without increasing the need for subsequent visits.
ArXiv.org · 2026-03-19
articleOpen accessData science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform near or below the median of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .
SSRN Electronic Journal · 2026-01-01
preprintOpen accessArXiv.org · 2026-01-26
articleOpen access1st authorCorrespondingWith the advancement of machine learning and artificial intelligence technologies, recommender systems have been increasingly used across a vast variety of platforms to efficiently and effectively match users with items. As application contexts become more diverse and complex, there is a growing need for more sophisticated recommendation techniques. One example is the composite item (for example, fashion outfit) recommendation where multiple levels of user preference information might be available and relevant. In this study, we propose JIMA, a joint interaction modeling approach that uses a single model to take advantage of all data from different levels of granularity and incorporate interactions to learn the complex relationships among lower-order (atomic item) and higher-order (composite item) user preferences as well as domain expertise (e.g., on the stylistic fit). We comprehensively evaluate the proposed method and compare it with advanced baselines through multiple simulation studies as well as with real data in both offline and online settings. The results consistently demonstrate the superior performance of the proposed approach.
Open MIND · 2026-01-26
preprint1st authorCorrespondingWith the advancement of machine learning and artificial intelligence technologies, recommender systems have been increasingly used across a vast variety of platforms to efficiently and effectively match users with items. As application contexts become more diverse and complex, there is a growing need for more sophisticated recommendation techniques. One example is the composite item (for example, fashion outfit) recommendation where multiple levels of user preference information might be available and relevant. In this study, we propose JIMA, a joint interaction modeling approach that uses a single model to take advantage of all data from different levels of granularity and incorporate interactions to learn the complex relationships among lower-order (atomic item) and higher-order (composite item) user preferences as well as domain expertise (e.g., on the stylistic fit). We comprehensively evaluate the proposed method and compare it with advanced baselines through multiple simulation studies as well as with real data in both offline and online settings. The results consistently demonstrate the superior performance of the proposed approach.
Differentially Private Truncation of Unbounded Data via Public Second Moments
Open MIND · 2026-02-25
preprintData privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.
Physician Use of Large Language Models: A Quantitative Study Based on Large-Scale Query-Level Data
Journal of Medical Internet Research · 2025-08-25 · 4 citations
articleOpen accessBackground: Generative artificial intelligence (GenAI) has rapidly emerged as a promising tool in health care. Despite its growing adoption, how physicians make use of it in medical practice has not been qualitatively studied. Existing literature has largely focused on theoretical applications or experimental validations, with limited insight into real-world physician engagement with GenAI technologies. Objective: The aim of this study was to leverage a fine-grained dataset at the query level to quantitatively examine how physicians incorporate GenAI into their clinical and research workflows. The primary objective was to analyze usage patterns over time and across physician demographics. A secondary goal was to assess potential risks to patient privacy arising from physicians' interactions with GenAI platforms. Methods: This study collected 106,942 query-and-answer pairs by 989 physicians between August 29, 2023, and April 16, 2024. We performed topic classification to identify the most prevalent use cases, examining how these use cases evolved over time and across demographics. We also developed sensitivity classifiers to detect personally identifiable information in physicians' queries to explore the potential privacy breach risks around physicians' use of GenAI. Results: Approximately 40% (396/989) of the enrolled physicians were female, 45.9% (454/989) were younger than 25 years, and 54.1% (535/989) were between 25 and 56 years of age. The majority of them worked in clinical departments (680/989, 68.8%) or medical technology departments (127/989, 12.8%). Our classification-based quantitative analyses suggest the following. First, physicians use GenAI predominantly for medical research (64,379/106,942, 60.2%) rather than clinical practice (13,100/106,942, 12.25%). Second, physicians focus more on health care-related questions (rising from 64,165/106,942, 60% to 83,415/106,942, 78%) within the first 15% (16,041/106,942) of their query sequence. Third, the use of GenAI differed across physician demographics and features. Specifically, female physicians asked a larger proportion of clinical questions (female: 0.154 vs male: 0.108; P<.001) and administration questions (female: 0.027 vs male: 0.018; P<.001) than male physicians; younger physicians posed more clinical questions (age ≤25: 0.146 vs age ∈ (25, 40]: 0.115 vs age >40: 0.103; P<.001) but fewer research questions (age ≤25: 0.580 vs age ∈ (25, 40]: 0.607 vs age >40: 0.664; P<.001) than senior physicians; and physicians accessing GenAI via computers asked more research questions (computer: 0.637 vs mobile: 0.296; P<.001), whereas physicians using mobile devices asked more clinical questions (computer: 0.107 vs mobile: 0.264; P<.001). Fourth, only 2.68% (2866/106,942) of physician queries contained sensitive information, the majority of which were primarily derived from writing and editing. Conclusions: Physicians are actively integrating GenAI into their professional routines, primarily leveraging it for research but also increasingly for clinical support. Usage patterns vary significantly across demographic lines, including gender, age, and device preference. Despite the presence of sensitive information in some queries, the risk of privacy breaches appears to be low.
Frequent coauthors
- 29 shared
Annie Qu
University of California, Irvine
- 27 shared
Mochen Yang
Twin Cities Orthopedics
- 17 shared
Gediminas Adomavičius
- 14 shared
Heping Zhang
Public Health Department
- 12 shared
Long Feng
- 12 shared
Xiwei Tang
Hunan First Normal University
- 9 shared
Xiaotong Shen
Beijing Normal University
- 6 shared
Yi Zhu
Shanghai University of Finance and Economics
Awards & honors
- Winner of ASA Student Paper Award (SLDS Section, 2016)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Xuan Bi
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup