Chenyang Jiang

· Professor of Civil & Environmental EngineeringVerified

University of Maryland, College Park · Environmental & Occupational Health

Active 1999–2024

h-index24

Citations1.4k

Papers7222 last 5y

Funding—

Faculty page Lab page

See your match with Chenyang Jiang — sign in to PhdFit.Sign in

About

Dr. Sunny Jiang is a professor of Environmental Engineering in the Department of Civil and Environmental Engineering with a joint appointment in the Department of Ecology and Evolutionary Biology at the University of California, Irvine. She teaches graduate and undergraduate courses including Environmental Processes, Microbiology for Engineers, and Desalination & Water Reuse. With over 20 years of experience in environmental microbiology research, Dr. Jiang is a recognized leader in the fields of water quality, pathogen detection, microbial risk assessment, and membrane biofouling. She has authored more than 70 research publications in environmental microbiology. Dr. Jiang earned her Ph.D. in Marine Science from the University of South Florida and completed postdoctoral training with Dr. Rita Colwell at the University of Maryland. She has received several honors and awards, including the Excellent Research Mentor Award and the Sacket Prize for Innovative Research. Outside of her professional work, Dr. Jiang enjoys outdoor activities and has completed eight full marathons and many half marathons.

Research topics

Environmental science
Ecology
Medicine
Chemistry
Environmental chemistry
Oceanography
Geography
Internal medicine
Pediatrics
Biology
Environmental health
Demography
Animal science
Environmental engineering
Veterinary medicine
Meteorology
Geology

Selected publications

Deep active learning with KD-Tree based greedy sampling in structural simulation
2024-04-24
book-chapter1st authorCorresponding
Deep learning has found extensive application in the realm of maritime technology and engineering, such as the development of surrogate model for structural simulation. However, it is expensive to generate training examples, particularly in structural simulation. To address the need for reducing the expensive process of generating training examples, a deep active learning (DAL) with KD-Tree based greedy sampling is proposed. Within this active learning method, a predefined acquisition function is used to select samples for querying from unlabeled sample pool. The acquisition function is based on the greedy sampling approach where the diversity in input and output spaces is increased by selecting new samples from the unlabeled sample pool. The data in the unlabeled sample pool is structured for KD-Tree construction, resulting in a significant reduction in the time complexity of the DAL method. A comparative study with random sampling is made to demonstrate the effectiveness and advantage of the proposed DAL method. A significantly lower test loss and variance is achieved by the proposed method compared to the random sampling approach under conditions of a small sample size. This approach can be applied more broadly to enhance other deep learning models used in the maritime technology and engineering domain, as it provides a computationally efficient way to achieve additional data.
Publisher DOI
SUDO: a framework for evaluating clinical artificial intelligence systems without ground-truth annotations
Research Square · 2024-01-12
preprintOpen access
Publisher OA PDF DOI
SUDO: a framework for evaluating clinical artificial intelligence systems without ground-truth annotations
arXiv (Cornell University) · 2024-01-02
preprintOpen access
A clinical artificial intelligence (AI) system is often validated on a held-out set of data which it has not been exposed to before (e.g., data from a different hospital with a distinct electronic health record system). This evaluation process is meant to mimic the deployment of an AI system on data in the wild; those which are currently unseen by the system yet are expected to be encountered in a clinical setting. However, when data in the wild differ from the held-out set of data, a phenomenon referred to as distribution shift, and lack ground-truth annotations, it becomes unclear the extent to which AI-based findings can be trusted on data in the wild. Here, we introduce SUDO, a framework for evaluating AI systems without ground-truth annotations. SUDO assigns temporary labels to data points in the wild and directly uses them to train distinct models, with the highest performing model indicative of the most likely label. Through experiments with AI systems developed for dermatology images, histopathology patches, and clinical reports, we show that SUDO can be a reliable proxy for model performance and thus identify unreliable predictions. We also demonstrate that SUDO informs the selection of models and allows for the previously out-of-reach assessment of algorithmic bias for data in the wild without ground-truth annotations. The ability to triage unreliable predictions for further inspection and assess the algorithmic bias of AI systems can improve the integrity of research findings and contribute to the deployment of ethical AI systems in medicine.
Publisher OA PDF DOI
A framework for evaluating clinical artificial intelligence systems without ground-truth annotations
Nature Communications · 2024-02-28 · 18 citations
articleOpen access
Abstract A clinical artificial intelligence (AI) system is often validated on data withheld during its development. This provides an estimate of its performance upon future deployment on data in the wild; those currently unseen but are expected to be encountered in a clinical setting. However, estimating performance on data in the wild is complicated by distribution shift between data in the wild and withheld data and the absence of ground-truth annotations. Here, we introduce SUDO, a framework for evaluating AI systems on data in the wild. Through experiments on AI systems developed for dermatology images, histopathology patches, and clinical notes, we show that SUDO can identify unreliable predictions, inform the selection of models, and allow for the previously out-of-reach assessment of algorithmic bias for data in the wild without ground-truth annotations. These capabilities can contribute to the deployment of trustworthy and ethical AI systems in medicine.
Publisher OA PDF DOI
A systematic approach towards missing lab data in electronic health records: A case study in non‐small cell lung cancer and multiple myeloma
CPT Pharmacometrics & Systems Pharmacology · 2023-06-15 · 12 citations
articleOpen access
Real-world data derived from electronic health records often exhibit high levels of missingness in variables, such as laboratory results, presenting a challenge for statistical analyses. We developed a systematic workflow for gathering evidence of different missingness mechanisms and performing subsequent statistical analyses. We quantify evidence for missing completely at random (MCAR) or missing at random (MAR), mechanisms using Hotelling's multivariate t-test, and random forest classifiers, respectively. We further illustrate how to apply sensitivity analyses using the not at random fully conditional specification procedure to examine changes in parameter estimates under missing not at random (MNAR) mechanisms. In simulation studies, we validated these diagnostics and compared analytic bias under different mechanisms. To demonstrate the application of this workflow, we applied it to two exemplary case studies with an advanced non-small cell lung cancer and a multiple myeloma cohort derived from a real-world oncology database. Here, we found strong evidence against MCAR, and some evidence of MAR, implying that imputation approaches that attempt to predict missing values by fitting a model to observed data may be suitable for use. Sensitivity analyses did not suggest meaningful departures of our analytic results under potential MNAR mechanisms; these results were also in line with results reported in clinical trials.
Publisher OA PDF DOI
A Natural Language Processing Algorithm to Improve Completeness of ECOG Performance Status in Real-World Data
Applied Sciences · 2023-05-18 · 20 citations
articleOpen accessSenior author
Our goal was to develop and characterize a Natural Language Processing (NLP) algorithm to extract Eastern Cooperative Oncology Group Performance Status (ECOG PS) from unstructured electronic health record (EHR) sources to enhance observational datasets. By scanning unstructured EHR-derived documents from a real-world database, the NLP algorithm assigned ECOG PS scores to patients diagnosed with one of 21 cancer types who lacked structured ECOG PS numerical scores, anchored to the initiation of treatment lines. Manually abstracted ECOG PS scores were used as a source of truth to both develop the algorithm and evaluate accuracy, sensitivity, and positive predictive value (PPV). Algorithm performance was further characterized by investigating the prognostic value of composite ECOG PS scores in patients with advanced non-small cell lung cancer receiving first line treatment. Of N = 480,825 patient-lines, structured ECOG PS scores were available for 290,343 (60.4%). After applying NLP-extraction, the availability increased to 73.2%. The algorithm’s overall accuracy, sensitivity, and PPV were 93% (95% CI: 92–94%), 88% (95% CI: 87–89%), and 88% (95% CI: 87–89%), respectively across all cancer types. In a cohort of N = 51,948 aNSCLC patients receiving 1L therapy, the algorithm improved ECOG PS completeness from 61.5% to 75.6%. Stratification by ECOG PS showed worse real-world overall survival (rwOS) for patients with worse ECOG PS scores. We developed an NLP algorithm to extract ECOG PS scores from unstructured EHR documents with high accuracy, improving data completeness for EHR-derived oncology cohorts.
Publisher OA PDF DOI
Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
Cancers · 2022-06-22 · 27 citations
reviewOpen access
A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.
Publisher OA PDF DOI
Environmental Injustice and Industrial Chicken Farming in Maryland
International Journal of Environmental Research and Public Health · 2021-10-20 · 13 citations
articleOpen access
Maryland’s growing chicken industry, including concentrated animal feeding operations (CAFOs) and meat processing plants, raises a number of concerns regarding public health and environmental justice. Using hot spot analysis, we analyzed the totality of Maryland’s CAFOs and meat processing plants and those restricted to the Eastern Shore to assess whether communities of color and/or low socioeconomic status communities disproportionately hosted these types of facilities at the census tract level. We used zero-inflated regression modeling to determine the strength of the associations between environmental justice variables and the location of CAFOs and meatpacking facilities at the State level and on the Eastern Shore. Hot spot analyses demonstrated that CAFO hot spots on the Eastern Shore were located in counties with some of the lowest wealth in the State, including the lowest ranking county—Somerset. Zero-inflated regression models demonstrated that increases in median household income across the state were associated with a 0.04-unit reduction in CAFOs. For every unit increase in the percentage of people of color (POC), there was a 0.02-unit increase in meat processing facilities across the state. The distribution of CAFOs and meat processing plants across Maryland may contribute to poor health outcomes in areas affected by such production, and contribute to health disparities and health inequity.
Publisher OA PDF DOI
Global Population Exposed to Extreme Events in the 150 Most Populated Cities of the World: Implications for Public Health
International Journal of Environmental Research and Public Health · 2021-02-01 · 22 citations
articleOpen access
Climate change driven increases in the frequency of extreme heat events (EHE) and extreme precipitation events (EPE) are contributing to both infectious and non-infectious disease burden, particularly in urban city centers. While the share of urban populations continues to grow, a comprehensive assessment of populations impacted by these threats is lacking. Using data from weather stations, climate models, and urban population growth during 1980-2017, here, we show that the concurrent rise in the frequency of EHE, EPE, and urban populations has resulted in over 500% increases in individuals exposed to EHE and EPE in the 150 most populated cities of the world. Since most of the population increases over the next several decades are projected to take place in city centers within low- and middle-income countries, skillful early warnings and community specific response strategies are urgently needed to minimize public health impacts and associated costs to the global economy.
Publisher OA PDF DOI
Enteric Viruses and Pepper Mild Mottle Virus Show Significant Correlation in Select Mid-Atlantic Agricultural Waters
Applied and Environmental Microbiology · 2021-04-23 · 12 citations
articleOpen access
Microbiological analysis of agricultural waters is fundamental to ensure microbial food safety. The highly variable nature of nontraditional sources of irrigation water makes them particularly difficult to test for the presence of viruses. Multiple characteristics influence viral persistence in a water source, as well as affecting the recovery and detection methods that are employed. Testing for a suite of viruses in water samples is often too costly and labor-intensive, making identification of suitable indicators for viral pathogen contamination necessary. The results from this study address two critical data gaps, namely, EV prevalence in surface and reclaimed waters of the Mid-Atlantic region of the United States and subsequent evaluation of physicochemical and atmospheric parameters used to inform the potential for the use of indicators of viral contamination.
Publisher OA PDF DOI

Frequent coauthors

Amir Sapkota
28 shared
Sacoby Wilson
University of Mary
20 shared
Rianna Murray
14 shared
Raghu Murtugudde
University of Maryland, College Park
14 shared
Amy R. Sapkota
University of Maryland, College Park
13 shared
Clifford S. Mitchell
Maryland Department of Health
13 shared
Kristen Burwell-Naney
Durham County Department of Public Health
12 shared
LaShanta Rice
10 shared

Labs

Sunny Jiang LabPI

Education

Ph.D.
University of South Florida
1996
Other, with Dr. Rita Colwell
University of Maryland
1997

Awards & honors

Lifetime Achievement Awards, 2024, by Chinese-American Profe…
Outstanding Publication Award, 2023, by Association of Envir…
Innovator of the Year Award, Potable Pathogen Analyzer, 2021…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Chenyang Jiang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you