
Rumi Chunara
· Associate Professor of Computer Science and Engineering and BiostatisticsVerifiedNew York University · Computer Science
Active 2004–2026
About
Rumi Chunara is an Associate Professor at NYU Tandon School of Engineering, jointly appointed in the Department of Computer Science and Engineering and the School of Global Public Health in Biostatistics. She is the Director of the Center for Health Data Science. Her work focuses on the intersection of computer science, biostatistics, and public health, with an emphasis on themes of public health and equity. Chunara's research involves designing and developing data mining and machine learning methods to address challenges related to data and public health goals, as well as exploring issues of fairness and ethics in the design and use of data and algorithms embedded in social systems. Her research interests include public and population health, socio-environmental determinants, machine learning and AI in societal systems, algorithmic fairness, and inequity and technology. Chunara has contributed to developing computational and statistical methods, including data mining and machine learning, to understand social determinants of health and health disparities. Her work includes analyzing social media posts to map systemic racism and homophobia, assessing mental health impacts, and analyzing electronic health records to understand disparities in telemedicine access during COVID-19. She has received numerous awards, including the NSF CAREER Award, Facebook Research Award, and MIT Technology Review Top 35 Innovators Under 35. Chunara has been involved in innovative projects such as developing AI systems to map urban green spaces using satellite imagery, which helps expose environmental divides and inform urban planning for healthier cities.
Research topics
- Computer Science
- Artificial Intelligence
- Medicine
- Machine Learning
- Virology
- Political Science
- Computer Security
- Mathematics
- Environmental health
- Internal medicine
- Gerontology
- Simulation
- Psychology
- Data science
- Risk analysis (engineering)
- Social psychology
- Economic growth
- Pathology
- Economics
Selected publications
Monsoon weather and food security in Pakistan
Food Security · 2026-01-06
articleOpen accessSenior authorAbstract The intensification of monsoon weather anomalies due to climate change poses significant challenges to food security in South East Asia. In this paper we focus on Pakistan, to understand how dry conditions during the monsoon season affect food security. Using the Standardized Precipitation Evapotranspiration Index, we examine the effects of drier monsoon seasons on self-assessed food security. For this purpose, we leverage Pakistan Social and Living Standards Measurement Survey data for 2019–2020 on 147,063 households and combine it at the district level with the European Re-Analysis-Land meteorological data provided by the Copernicus Data Store. Our findings highlight an 8.5% increase in mild to severe food insecurity with exposure to a dry monsoon season. The impact is mostly concentrated on the quality and diversity of food available suggestive of a change in the food balance rather than in the total caloric intake. We also observe dry conditions to determine a larger decrease in food diversity in individuals with none or low educational attainment. Findings suggest that the increased unpredictability of monsoon weather could exacerbate food insecurity with vulnerable groups being affected foremost in Pakistan and highlights the critical need for measures to mitigate the impact of droughts on food diversity and quality.
Figshare · 2026-01-31
otherOpen accessAbstract Background Colorectal cancer (CRC) is a leading cause of cancer-related morbidity and mortality in the United States (U.S.). CRC screening (CRCS) is an important method of cancer prevention and early detection, but often receives less public attention than other cancer and screening types. Regular CRCS is recommended for most individuals aged 45–75, though screening rates are lower than desired in the U.S. Despite similar screening rates, Black Americans experience a variety of CRC-related disparities including higher incidence and mortality rates in the United States. Research suggests Black Americans often do not receive timely follow ups to stool-based screening tests and receive lower quality endoscopic care. Effective public health communication strategies are needed to improve CRCS rates, particularly among populations experiencing disparities. The current study tests the ability of a novel approach to content generation and selection (crowdsourcing) to identify effective public health communication content. Methods/design This protocol is for a five-arm, online, randomized controlled trial testing different CRCS content types among white and Black/African Americans. In total, 2,000 non-Hispanic Black (n = 1,000) and white (n = 1,000) Americans will be recruited into the trial, randomized into one of five conditions. The five conditions are based on content rankings and preferences from our previous work: (1) overall preferred content, (2) Black American preferred content, (3) white American preferred content, (4) median ranked content (i.e., “standard of care”), and (5) control/no exposure to content. The primary outcomes are intentions to adhere to CRCS recommendations and screening preferences for intentions to adhere. Secondary outcomes include likelihood of sharing information via social media and information sharing behavior. Comparisons will be examined for differences in outcomes as hypothesized, with exploratory analyses undertaken as well. The study is powered to detect a small effect, which is expected based on past research. Discussion The trial aims to determine the effectiveness of crowdsource-selected CRCS messages to improve screening intentions and likelihood of sharing information via social media. This trial takes an important step in testing an innovative and scalable approach to identify and select content to be disseminated by public health communicators. Trial registration NCT06712901.
Figshare · 2026-01-31
articleSupplementary Material 2.
Figshare · 2026-01-31
articleOpen accessSupplementary Material 2.
Figshare · 2026-01-31
otherOpen accessAbstract Background Colorectal cancer (CRC) is a leading cause of cancer-related morbidity and mortality in the United States (U.S.). CRC screening (CRCS) is an important method of cancer prevention and early detection, but often receives less public attention than other cancer and screening types. Regular CRCS is recommended for most individuals aged 45–75, though screening rates are lower than desired in the U.S. Despite similar screening rates, Black Americans experience a variety of CRC-related disparities including higher incidence and mortality rates in the United States. Research suggests Black Americans often do not receive timely follow ups to stool-based screening tests and receive lower quality endoscopic care. Effective public health communication strategies are needed to improve CRCS rates, particularly among populations experiencing disparities. The current study tests the ability of a novel approach to content generation and selection (crowdsourcing) to identify effective public health communication content. Methods/design This protocol is for a five-arm, online, randomized controlled trial testing different CRCS content types among white and Black/African Americans. In total, 2,000 non-Hispanic Black (n = 1,000) and white (n = 1,000) Americans will be recruited into the trial, randomized into one of five conditions. The five conditions are based on content rankings and preferences from our previous work: (1) overall preferred content, (2) Black American preferred content, (3) white American preferred content, (4) median ranked content (i.e., “standard of care”), and (5) control/no exposure to content. The primary outcomes are intentions to adhere to CRCS recommendations and screening preferences for intentions to adhere. Secondary outcomes include likelihood of sharing information via social media and information sharing behavior. Comparisons will be examined for differences in outcomes as hypothesized, with exploratory analyses undertaken as well. The study is powered to detect a small effect, which is expected based on past research. Discussion The trial aims to determine the effectiveness of crowdsource-selected CRCS messages to improve screening intentions and likelihood of sharing information via social media. This trial takes an important step in testing an innovative and scalable approach to identify and select content to be disseminated by public health communicators. Trial registration NCT06712901.
Prior Authorization Requirements and Prescription Fill Patterns Among Patients With Heart Failure
JACC Advances · 2026-01-24 · 1 citations
articleOpen accessBACKGROUND: Prior authorizations could hinder the filling of life-saving heart failure (HF) medications, such as angiotensin receptor neprilysin inhibitors (ARNIs) and sodium glucose cotransporter 2 inhibitors (SGLT2is). OBJECTIVES: The aim of the study was to determine whether prior authorizations were associated with delayed or decreased filling for ARNI and SGLT2i. METHODS: This was a retrospective cohort study using electronic health record, pharmacy fill, and neighborhood-level data from a large, academic health system. We included patients with HF and a new prescription for ARNI or SGLT2i between April 1, 2021, and April 30, 2023, and assessed for presence of prior authorization requirement. Outcomes included days to first fill and never filling the prescription. Analyses were conducted using inverse probability weighting methods. RESULTS: Among 2,183 patients, 12.2% (152/1,243) and 14.3% (165/1,150) had a prior authorization requirement for ARNI or SGLT2i, respectively. Patients requiring prior authorization tended to be younger, identify as non-Hispanic Black or Hispanic, have non-Medicare insurance, and have fewer comorbidities. In weighted models, patients requiring prior authorization took 3.03 (95% CI: 2.16-4.25) times longer to fill ARNI, 6.75 (95% CI: 4.44-10.3) times longer to fill SGLT2i, and were 2.23 (95% CI: 1.37-3.65) times more likely to never fill SGLT2i prescriptions (all P < 0.001). CONCLUSIONS: Prior authorization requirements were more common for patients identifying as Black or Hispanic and were associated with decreased and delayed filling of ARNI and SGLT2i. Our findings highlight an important barrier to mortality-reducing, guideline-recommended medications for HF.
Harnessing data science for health discovery and innovation in Africa (DS-I Africa)
Communications Medicine · 2026-02-28
articleOpen accessThe integration of data science and artificial intelligence approaches, through multidisciplinary collaboration, is set to advance health research in Africa. Here, we describe the Harnessing Data Science for Health Discovery and Innovation in Africa-funded effort for enhancing healthcare delivery, disease surveillance, predictive modelling, capacity building and research innovation across Africa and globally. Xiong et al. discuss the Data Science for Health Discovery and Innovation in Africa, a US-Africa partnership aiming to transform health via innovative data science. Achievements to date include 38 projects in 22 African countries, an open FAIR data ecosystem, and AI tools for disease management and pandemic preparedness.
BMC Public Health · 2026-01-31
articleOpen accessColorectal cancer (CRC) is a leading cause of cancer-related morbidity and mortality in the United States (U.S.). CRC screening (CRCS) is an important method of cancer prevention and early detection, but often receives less public attention than other cancer and screening types. Regular CRCS is recommended for most individuals aged 45–75, though screening rates are lower than desired in the U.S. Despite similar screening rates, Black Americans experience a variety of CRC-related disparities including higher incidence and mortality rates in the United States. Research suggests Black Americans often do not receive timely follow ups to stool-based screening tests and receive lower quality endoscopic care. Effective public health communication strategies are needed to improve CRCS rates, particularly among populations experiencing disparities. The current study tests the ability of a novel approach to content generation and selection (crowdsourcing) to identify effective public health communication content. This protocol is for a five-arm, online, randomized controlled trial testing different CRCS content types among white and Black/African Americans. In total, 2,000 non-Hispanic Black (n = 1,000) and white (n = 1,000) Americans will be recruited into the trial, randomized into one of five conditions. The five conditions are based on content rankings and preferences from our previous work: (1) overall preferred content, (2) Black American preferred content, (3) white American preferred content, (4) median ranked content (i.e., “standard of care”), and (5) control/no exposure to content. The primary outcomes are intentions to adhere to CRCS recommendations and screening preferences for intentions to adhere. Secondary outcomes include likelihood of sharing information via social media and information sharing behavior. Comparisons will be examined for differences in outcomes as hypothesized, with exploratory analyses undertaken as well. The study is powered to detect a small effect, which is expected based on past research. The trial aims to determine the effectiveness of crowdsource-selected CRCS messages to improve screening intentions and likelihood of sharing information via social media. This trial takes an important step in testing an innovative and scalable approach to identify and select content to be disseminated by public health communicators. NCT06712901.
Identity-Robust Language Model Generation via Content Integrity Preservation
arXiv (Cornell University) · 2026-01-14
preprintOpen accessSenior authorLarge Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in identity-dependent bias compared to vanilla prompting and a 45% reduction relative to prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
Identity-Robust Language Model Generation via Content Integrity Preservation
ArXiv.org · 2026-01-14
articleOpen accessSenior authorLarge Language Model (LLM) outputs often vary across user sociodemographic attributes, leading to disparities in factual accuracy, utility, and safety, even for objective questions where demographic information is irrelevant. Unlike prior work on stereotypical or representational bias, this paper studies identity-dependent degradation of core response quality. We show empirically that such degradation arises from biased generation behavior, despite factual knowledge being robustly encoded across identities. Motivated by this mismatch, we propose a lightweight, training-free framework for identity-robust generation that selectively neutralizes non-critical identity information while preserving semantically essential attributes, thus maintaining output content integrity. Experiments across four benchmarks and 18 sociodemographic identities demonstrate an average 77% reduction in identity-dependent bias compared to vanilla prompting and a 45% reduction relative to prompt-based defenses. Our work addresses a critical gap in mitigating the impact of user identity cues in prompts on core generation quality.
Recent grants
NYU-Moi Data Science for Health Determinants Training Program
NIH · $1.9M · 2025–2026
CAREER: Learning from When, Where and by Whom Data is Generated for Advancing Public Health Studies
NSF · $550k · 2019–2026
ATD: Collaborative Research: Algorithms and Data for High-Frequency, Real-Time Anomaly Detection
NSF · $100k · 2017–2020
Detecting Youth Drinking and Associations with Alcohol Policies via Social Media
NIH · $464k · 2015–2018
NSF · $673k · 2013–2015
Frequent coauthors
- 68 shared
John S. Brownstein
Boston Children's Hospital
- 18 shared
Vishwali Mhasawade
- 13 shared
Stephanie Cook
New York University
- 12 shared
Kunal Relia
- 11 shared
Clark C. Freifeld
- 11 shared
Zainab Samad
Aga Khan University
- 10 shared
Nabeel Abdur Rehman
New York University
- 10 shared
Salim S. Virani
Michael E. DeBakey VA Medical Center
Labs
Center for Health Data SciencePI
Awards & honors
- ACM Distinguished Member
- NSF CAREER Award
- Facebook Research Award
- MIT Presidential Fellow
- Gates Foundation Grand Challenges Exploration Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Rumi Chunara
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup