
Kai-Wei Chang
· Associate ProfessorVerifiedUniversity of California, Los Angeles · Computer Science
Active 2003–2026
About
Kai-Wei Chang is an associate professor in the Department of Computer Science at the UCLA Samueli School of Engineering. His research focuses on building intelligence systems that solve real-world problems by automatically acquiring knowledge. This involves developing machine learning components capable of efficiently making coherent decisions for problems with complex structures, as well as natural language understanding components that enable systems to extract knowledge from unstructured text. He has been broadly published in fields including machine learning, natural language processing, artificial intelligence, and data mining. Chang's work aims to advance the understanding and application of statistical approaches to natural language processing and tractable machine learning methods for handling complex and large-scale data.
Research topics
- Computer Science
- Machine Learning
- Artificial Intelligence
- Data Mining
- Theoretical computer science
Selected publications
medRxiv · 2026-04-22
articleOpen accessAmbulatory electrocardiograms (ECG) provides continuous monitoring of the heart's electrical activity. However, many existing machine learning and artificial intelligence models for analyzing ambulatory ECG traces are often unimodal and do not incorporate patient clinical context. In this study, we propose a multimodal framework integrating ambulatory ECG-derived representations with clinical text embeddings to predict two cardiac outcomes: sudden cardiac death and pump failure death. Ambulatory ECG traces are preprocessed, segmented, and encoded via a multiple instance learning and temporal convolutional neural network framework. In parallel, patient clinical features are parsed into structured prompts, which are passed through a large language model to generate clinical reasoning; this reasoning passes through a biomedical language encoder to generate a text embedding. With the ECG and text embeddings, we systematically evaluate multiple fusion strategies, including concatenation- and gating-based approaches, to integrate these two data modalities. Our results demonstrate that multimodal models consistently outperform unimodal baselines, with adaptive fusion mechanisms providing the greatest improvements in predictive performance. Decision curve analysis highlights the potential clinical utility of the proposed framework for risk stratification. Finally, we visualize model attention across modalities, including ECG attention patterns, segment-level saliency, heart rate variability features, and clinical reasoning, to contextualize patient-specific predictions.
2025-06-10 · 2 citations
articleDespite inheriting security measures from underlying language models, Vision-Language Models (VLMs) may still be vulnerable to safety alignment issues. Through empirical analysis, we uncover two critical findings: scenario- matched images can significantly amplify harmful outputs, and contrary to common assumptions in gradient-based attacks, minimal loss values do not guarantee optimal attack effectiveness. Building on these insights, we introduce MLAI (Multi-Loss Adversarial Images), a novel jailbreak framework that leverages scenario-aware image generation for semantic alignment, exploits flat minima theory for robust adversarial image selection, and employs multi- image collaborative attacks for enhanced effectiveness. Extensive experiments demonstrate MLAI’s significant impact, achieving attack success rates of 77.75% on MiniGPT-4 and 82.80% on LLaVA-2, substantially outperforming existing methods by margins of 34.37% and 12.77% respectively. Furthermore, MLAI shows considerable transferability to commercial black-box VLMs, achieving up to 60.11% success rate. Our work reveals fundamental visual vulnerabilities in current VLMs safety mechanisms and underscores the need for stronger defenses. Warning: This paper contains potentially harmful example text.
Information · 2025-11-15
articleOpen accessA widely used repository of violent death records is the U.S. Centers for Disease Control National Violent Death Reporting System (NVDRS). The NVDRS includes narrative data, which researchers frequently utilize to go beyond its structured variables. Prior work has shown that NVDRS narratives vary in length depending on decedent and incident characteristics, including race/ethnicity. Whether these length differences reflect differences in narrative information potential is unclear. We use the 2003–2021 NVDRS to investigate narrative length and complexity measures among 300,323 suicides varying in decedent and incident characteristics. To do so, we operationalized narrative complexity using three manifest measures: word count, sentence count, and dependency tree depth. We then employed regression methods to predict word counts and narrative complexity scores from decedent and incident characteristics. Both were consistently lower for black non-Hispanic decedents compared to white, non-Hispanic decedents. Although narrative complexity is just one aspect of narrative information potential, these findings suggest that the information in NVDRS narratives is more limited for some racial/ethnic minorities. Future studies, possibly leveraging large language models, are needed to develop robust measures to aid in determining whether narratives in the NVDRS have achieved their stated goal of fully describing the circumstances of suicide.
SemEval-2025 Task 4: Unlearning sensitive content from Large Language Models
ArXiv.org · 2025-04-02
preprintOpen accessWe introduce SemEval-2025 Task 4: unlearning sensitive content from Large Language Models (LLMs). The task features 3 subtasks for LLM unlearning spanning different use cases: (1) unlearn long form synthetic creative documents spanning different genres; (2) unlearn short form synthetic biographies containing personally identifiable information (PII), including fake names, phone number, SSN, email and home addresses, and (3) unlearn real documents sampled from the target model's training dataset. We received over 100 submissions from over 30 institutions and we summarize the key techniques and lessons in this paper.
Academic Emergency Medicine · 2025-07-29
articleOpen accessOBJECTIVES: Emergency department (ED) visits offer opportunities for seriously ill patients to formulate future medical care goals, yet ED clinicians lack practical strategies for these conversations. ED GOAL, a behavioral intervention, engages seriously ill yet clinically stable older adults in the ED to address advance care planning (ACP) with their outpatient clinicians. In a randomized trial, goals-of-care documentation was significantly higher in the intervention group compared to controls after three (24.3% vs. 9.9%, p = 0.03) and 6 months (31.4% vs. 12.7%, p < 0.01). This study is a sub-analysis to learn about intervention arm participants' perceived benefits and obstacles of the intervention. METHODS: We conducted semi-structured interviews between October 2022 and August 2024 (N = 52) with intervention-arm patients aged 50+ years at three hospitals in Boston, Massachusetts. Using rapid qualitative analyses, we identified themes in intervention-arm participants' comments to open-ended questions about the intervention's benefits and obstacles to continue ACP outside the ED. RESULTS: Of 70 intervention-arm participants, 52 completed interviews, of which two were surrogates. ED GOAL motivated most patients to initiate ACP with outpatient clinicians and loved ones and improved the quality of conversations by clarifying patients' wishes and improving patient-clinician relations. Barriers to continuing ACP were the lack of clinician availability and patient/surrogate readiness. Those with clear care goals found the intervention less useful yet harmless. CONCLUSIONS: The intervention provided participants with insights into actionable ACP steps. To address the lack of clinician availability, these conversations may be completed by non-physician clinicians or through non-personnel resources. Better tailored ACP interventions may improve patients' readiness. TRIAL REGISTRATION: ClinicalTrials.gov identifier: NCT05209880.
BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
ArXiv.org · 2025-10-11
preprintOpen accessSenior authorMulti-LLM systems enhance the creativity of large language models by simulating human collective intelligence but suffer from significant drawbacks, such as high computational costs and inference latency. To address these limitations, we propose BILLY (BlendIng persona vectors for Large Language model creativitY), a training-free framework that captures the benefits of multi-LLM collaboration, i.e. inducing diverse perspectives and specialized expertise, within a single model. BILLY operates by extracting and blending multiple distinct persona vectors directly in the model's activation space. We steer the model's generation process with this merged vector while inference, enabling multi-perspective output without explicit multi-LLM communication. Our experiments across creativity-oriented benchmarks demonstrate that BILLY surpasses single model prompting and traditional multi-LLM approaches, while substantially reducing inference time and computational costs. Our analyses further reveal that distinct persona vectors can be blended to achieve both effective control over complementary aspects of generation and greater interpretability.
DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning
ArXiv.org · 2025-06-05
preprintOpen accessZero-shot Event Detection (ED), the task of identifying event mentions in natural language text without any training data, is critical for document understanding in specialized domains. Understanding the complex event ontology, extracting domain-specific triggers from the passage, and structuring them appropriately overloads and limits the utility of Large Language Models (LLMs) for zero-shot ED. To this end, we propose DiCoRe, a divergent-convergent reasoning framework that decouples the task of ED using Dreamer and Grounder. Dreamer encourages divergent reasoning through open-ended event discovery, which helps to boost event coverage. Conversely, Grounder introduces convergent reasoning to align the free-form predictions with the task-specific instructions using finite-state machine guided constrained decoding. Additionally, an LLM-Judge verifies the final outputs to ensure high precision. Through extensive experiments on six datasets across five domains and nine LLMs, we demonstrate how DiCoRe consistently outperforms prior zero-shot, transfer-learning, and reasoning baselines, achieving 4-7% average F1 gains over the best baseline -- establishing DiCoRe as a strong zero-shot ED framework.
Dynamic Generation of Multi-LLM Agents Communication Topologies with Graph Diffusion Models
ArXiv.org · 2025-10-09
preprintOpen accessThe efficiency of multi-agent systems driven by large language models (LLMs) largely hinges on their communication topology. However, designing an optimal topology is a non-trivial challenge, as it requires balancing competing objectives such as task performance, communication cost, and robustness. Existing frameworks often rely on static or hand-crafted topologies, which inherently fail to adapt to diverse task requirements, leading to either excessive token consumption for simple problems or performance bottlenecks for complex ones. To address this challenge, we introduce a novel generative framework called \textit{Guided Topology Diffusion (GTD)}. Inspired by conditional discrete graph diffusion models, GTD formulates topology synthesis as an iterative construction process. At each step, the generation is steered by a lightweight proxy model that predicts multi-objective rewards (e.g., accuracy, utility, cost), enabling real-time, gradient-free optimization towards task-adaptive topologies. This iterative, guided synthesis process distinguishes GTD from single-step generative frameworks, enabling it to better navigate complex design trade-offs. We validated GTD across multiple benchmarks, and experiments show that this framework can generate highly task-adaptive, sparse, and efficient communication topologies, significantly outperforming existing methods in LLM agent collaboration.
Agree to Disagree? A Meta-Evaluation of LLM Misgendering
ArXiv.org · 2025-04-23
preprintOpen accessNumerous methods have been proposed to measure LLM misgendering, including probability-based evaluations (e.g., automatically with templatic sentences) and generation-based evaluations (e.g., with automatic heuristics or human validation). However, it has gone unexamined whether these evaluation methods have convergent validity, that is, whether their results align. Therefore, we conduct a systematic meta-evaluation of these methods across three existing datasets for LLM misgendering. We propose a method to transform each dataset to enable parallel probability- and generation-based evaluation. Then, by automatically evaluating a suite of 6 models from 3 families, we find that these methods can disagree with each other at the instance, dataset, and model levels, conflicting on 20.2% of evaluation instances. Finally, with a human evaluation of 2400 LLM generations, we show that misgendering behaviour is complex and goes far beyond pronouns, which automatic evaluations are not currently designed to capture, suggesting essential disagreement with human evaluations. Based on our findings, we provide recommendations for future evaluations of LLM misgendering. Our results are also more widely relevant, as they call into question broader methodological conventions in LLM evaluation, which often assume that different evaluation methods agree.
FrameMind: Frame-Interleaved Video Reasoning via Reinforcement Learning
ArXiv.org · 2025-09-28
preprintOpen accessCurrent video understanding models rely on fixed frame sampling strategies, processing predetermined visual inputs regardless of the specific reasoning requirements of each question. This static approach limits their ability to adaptively gather visual evidence, leading to suboptimal performance on tasks that require either broad temporal coverage or fine-grained spatial detail. In this paper, we introduce FrameMind, an end-to-end framework trained with reinforcement learning that enables models to dynamically request visual information during reasoning through Frame-Interleaved Chain-of-Thought (FiCOT). Unlike traditional approaches, FrameMind operates in multiple turns where the model alternates between textual reasoning and active visual perception, using tools to extract targeted frames or video clips based on identified knowledge gaps. To train effective dynamic sampling policies, we propose Dynamic Resolution Frame Sampling (DRFS), which exposes models to diverse temporal-spatial trade-offs during learning, and DRFS-GRPO, a group-relative policy optimization algorithm that learns from outcome-based rewards without requiring frame-level annotations. Extensive experiments on challenging benchmarks like MLVU and VideoMME demonstrate that our method significantly outperforms existing models, advancing the state of the art in flexible and efficient video understanding.
Recent grants
AI-DCL: Governing bias in AI system with humans in the decision loop
NSF · $300k · 2019–2022
CAREER: MetaQuerier: Dynamic Ad Hoc Information Integration Across the Internet
NSF · $300k · 2002–2007
CRII: RI: Learning Structured Prediction Models with Auxiliary Supervision
NSF · $174k · 2017–2017
ITR: Shallow Integration over the Deep Web: A Holistic Approach
NSF · $306k · 2003–2006
CRII: RI: Learning Structured Prediction Models with Auxiliary Supervision
NSF · $171k · 2017–2020
Frequent coauthors
- 107 shared
Nanyun Peng
- 57 shared
Kuan-Hao Huang
- 56 shared
Wasi Uddin Ahmad
- 46 shared
Cho‐Jui Hsieh
- 46 shared
Aram Galstyan
- 38 shared
Jieyu Zhao
- 36 shared
Liunian Harold Li
- 28 shared
Muhao Chen
University of California, Davis
Awards & honors
- Google Research Scholar Award, 2021
- EMNLP Best Long Paper Award, 2017
- C.L and Jane W. S. Liu Award, University of Illinois, 2013
- Yahoo! Key Scientific Challenges Program Award, 2010
- KDD Best Paper Award, 2010
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Kai-Wei Chang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup