Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Christina Boucher

Christina Boucher

· Ph.D. ProfessorVerified

University of Florida · Computer & Information Science & Engineering

Active 1984–2026

h-index16
Citations921
Papers12072 last 5y
Funding$4.6M1 active
See your match with Christina Boucher — sign in to PhdFit.Sign in

About

Christina Boucher is a Professor in the Department of Computer & Information Science & Engineering at the University of Florida. Her research focuses on human-centered computing, including the intersection of technology and learning, human-computer interaction, and educational technologies. She directs the Embodied Learning & Experience (ELX) Lab, which conducts research in cyberlearning and positive computing, aiming to develop technology-based approaches to solve real individual and societal problems.

Research topics

  • Computer Science
  • Data Mining
  • Biology
  • Artificial Intelligence
  • Information Retrieval
  • Statistics
  • Algorithm
  • Mathematics
  • Theoretical computer science
  • Database
  • Computational biology
  • Genetics
  • Combinatorics

Selected publications

  • Population differences of chromosome 22q11.2 duplication structure predispose differentially to microdeletion and inversion

    Nature Communications · 2026-04-18

    articleOpen access

    Abstract Chromosome 22q11.2 microdeletion syndrome (22q11.2DS) is mediated by high-identity polymorphic low-copy repeats (LCRA-to-D) that have been challenging to sequence characterize. We sequence-resolved 135 chromosome 22q11.2 haplotypes from diverse humans and define 63 distinct structural configurations differing in size by 11-fold for LCRA. This diversity is driven by a 105 kbp segmental duplication flanked by 25 kbp inverted repeats that arose in the apes but expanded in humans ~1 million years ago. African LCRA haplotypes are significantly longer ( p = 0.0047) and predicted to be more protective against 22q11.2DS ( p = 1.14×10 -6 ) due to enrichment of inverted 105 kbp repeats. We identify nine distinct (including five recurrent) inversions spanning LCRA-D. Sequencing four families indicates LCRA-D deletions map to 105 kbp repeats, whereas inversions map to the 25 kbp repeats. Here, we show specific haplotype LCR architectures and recurrent large-scale inversions modulate susceptibility to 22q11.2DS and help explain its reduced prevalence among individuals of African ancestry.

  • Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism

    bioRxiv (Cold Spring Harbor Laboratory) · 2026-05-01

    articleSenior author

    ABSTRACT Prefix-Free Parsing (PFP) is widely used in genomic data processing to construct compressed indexes on massive, highly repetitive datasets. However, existing CPU implementations are constrained by sequential bottlenecks, limiting their ability to scale to large-scale modern pangenomic collections. We introduce RAPID-PFP , a redesigned implementation of the PFP algorithm that takes advantage of the massive parallelism and high memory bandwidth of modern GPUs. RAPID-PFP parallelizes trigger-string detection, phrase parsing, dictionary construction, and parse generation through custom CUDA kernels and GPU-resident data structures built using cuDF, CuPy, and Numba-CUDA. The algorithm operates entirely within GPU memory, minimizes host interaction, and dynamically adapts to available VRAM, enabling efficient processing in a range of hardware configurations. Across E. coli and Human Pangenome (HPRC) datasets, RAPID-PFP produces identical output to established CPU pipelines while delivering an order-of-magnitude acceleration. On 3,682 E. coli assemblies, RAPID-PFP reduces runtime from 552 seconds to 17 seconds compared to PFP-FL (32.1 times) and from 1,078 seconds to 17 seconds compared to PFP-ITL (62.6 times). On the complete 46-sample HPRC dataset, RAPID-PFP achieves a 33.4 time speedup and successfully processes scales that PFP-ITL cannot handle. Performance improves with dataset size, reflecting that PFP maps naturally onto thousands of CUDA cores, yielding sublinear scaling relative to CPU implementations. RAPID-PFP demonstrates that foundational compressed-indexing algorithms can be re-engineered for accelerators, enabling scalable and practical preprocessing for large-scale genomic indexing workflows.

  • Scalable machine learning improves resistance prediction and identifies novel determinants in Mycobacterium tuberculosis

    bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-29

    articleSenior author

    Abstract Multidrug-resistant and extensively drug-resistant Mycobacterium tuberculosis (MTB) represents a growing global health crisis, characterized by limited treatment options and high mortality rates. Rapid and accurate prediction of resistance profiles is critical to guide effective therapy and curb transmission. Whole-genome sequencing (WGS) offers promise for individualized resistance profiling, yet existing computational tools remain constrained by predefined mutation catalogs and prohibitive resource requirements for large-scale analyses. Here, we present AURA, a GPU-accelerated, pangenome-scale machine learning framework for de novo resistance prediction. Trained on 12,185 globally diverse MTB isolates, AURA predicts resistance to 13 first-line, second-line, and repurposed antibiotics with high precision and identifies 59 novel resistance-associated loci, including variants in katG, pncA, rpoC , and members of the PE/PGRS gene family. By enabling model training on an unprecedented genomic scale, AURA provides new insights into the genetic architecture of resistance and establishes a scalable platform for precision-guided therapy and global surveillance of MTB.

  • Building genomic data structures from compressed representations using prefix-free parsing

    Genome Research · 2026-05-15

    preprintSenior author

    Advances in high-throughput sequencing have lowered the cost and complexity of genome sequencing, making it possible for the first time to assemble large pangenomic data sets for many species. These data sets, comprising thousands of individuals, already span from hundreds of gigabytes to petabytes, far exceeding the memory capacity of most machines, and are expected to continue growing in scale over time. Already, many traditional bioinformatics tools fail on inputs at this scale because they cannot construct their necessary data structures within memory limits. There is a growing need for methods that can construct these structures directly from compressed representations. Prefix-free parsing (PFP) addresses this challenge. PFP serves as a preprocessing step that compresses sufficiently repetitive text, yet still permits building important data structures for the original data set from its compressed output. This survey offers an overview of PFP, covering its core principles, the primary data structures it enables, current applications, and future research directions.

  • Enhanced barrier precautions to prevent transmission of <i>Staphylococcus aureus</i> and Carbapenem-resistant organisms in nursing home chronic ventilator units

    Infection Control and Hospital Epidemiology · 2025-08-22 · 2 citations

    article

    Abstract Objective: Assess the feasibility and effect of Enhanced Barrier Precautions (EBP) on the transmission of Staphylococcus aureus (SA) and carbapenem-resistant organisms (CRO) among residents in nursing home chronic ventilator units (NH-CVU). Design: Pre-post interventional study. Setting: Two community-based nursing homes with CVUs in Maryland. A total of 56 residents were enrolled in the baseline period and 64 residents were enrolled in the intervention period. Methods: During a 3-month baseline and intervention period, residents were swabbed monthly to estimate SA and CRO acquisition. During a 2-month training period, EBP was implemented for residents with chronic wounds, medical devices, or history of multidrug-resistant organism (MDRO) colonization. During the subsequent 3-month intervention period, healthcare personnel (HCP) wore gowns and gloves for high-contact care activities when residents were on EBP. Whole genome sequencing assessed resident-to-resident transmission. Results: At baseline, NH-CVU1 used gowns and gloves for all direct contact, while NH-CVU2 used EBP only for residents with a history of MDRO colonization. After training, the proportion of NH-CVU2 residents on EBP increased from 65% in the baseline period to 87% in the intervention period. Glove use was high (93–98%) in both NH-CVUs. Gown use increased from 39% to 77% in NH-CVU1 and from 26% to 72% in NH-CVU2. Resident-to-resident transmission of SA or CRO decreased by 25% in NH-CVU1 ( p = 0.60) and by 67% in NH-CVU2 ( p = 0.05). CRO transmission decreased by 33% in NH-CVU1 ( p = 0.54) and by 83% in NH-CVU2 ( p = 0.02). Conclusions: EBP is feasible and potentially decreases overall and CRO transmission in nursing home CVUs.

  • Formal verification of bioinformatics software using model checking and theorem proving

    Briefings in Bioinformatics · 2025-07-01

    articleOpen access

    While there is explosive growth in the creation of biological data, researchers rely on ad hoc verification methods such as testing with small simulated datasets. Due to their importance in biology and biomedicine, there is a critical need to verify these algorithms as well as their implementations to ensure that the results and conclusions are trustworthy. In this paper, we explore an effective combination of model checking and theorem proving of bioinformatics software, including BiopLib, BWA, Jellyfish, SDSL, Dashing, SPAdes, and MUMmer. We provide results for model checking for bioinfomatics software libraries and theorem proving for specific properties. Our model checking framework found several potential flaws in the two tools (BiopLib and BWA). We have also detected several failing cases in Succinct Data Structures Library (SDSL).

  • Robust 16S rRNA classification based on a compressed LCA index

    Genome Research · 2025-08-25 · 2 citations

    articleOpen access

    Taxonomic sequence classification is a computational problem central to the study of metagenomics and evolution. Advances in compressed indexing with the r -index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections. Previous work proposed the document array profiles, which use <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <m:mrow> <m:mi class="MJX-tex-caligraphic" mathvariant="script">O</m:mi> </m:mrow> <m:mo>(</m:mo> <m:mi>r</m:mi> <m:mi>d</m:mi> <m:mo>)</m:mo> </m:math> words of space, where r is the number of maximal equal-letter runs in the Burrows–Wheeler transform, and d is the number of distinct genomes. The linear dependence on d is limiting, because real taxonomies can easily contain 10,000s of leaves or more. We propose a method called cliff compression that reduces this size by a large factor, &gt;250× when indexing the SILVA 16S rRNA gene database. This method uses <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline" overflow="scroll"> <m:mi mathvariant="normal">Θ</m:mi> <m:mo>(</m:mo> <m:mi>r</m:mi> <m:mi>log</m:mi> <m:mo>⁡</m:mo> <m:mi>d</m:mi> <m:mo>)</m:mo> </m:math> words of space in expectation under a random model we propose here. We implemented these ideas in an open-source tool called Cliffy that performs efficient taxonomic classification of sequencing reads with respect to a compressed taxonomic index. When applied to simulated 16S rRNA reads, Cliffy's read-level accuracy is higher than Kraken2's by 11%–18%. Clade abundances are also more accurately predicted by Cliffy compared with Kraken2 and Bracken. Overall, Cliffy is a fast and space-economical extension to compressed full-text indexes, enabling them to perform fast and accurate taxonomic classification queries. Cliffy's accuracy underscores the advantages of full-text indexes, which offer a more precise solution compared with k -mer indexes designed for a specific k value.

  • SDSL-Mobile: Enabling space-efficient data structures for mobile applications

    SoftwareX · 2025-06-24

    articleOpen accessSenior authorCorresponding

    This paper presents the process and results of porting the Succinct Data Structure Library 2.0 (SDSL-lite), a robust and well-established open-source C++11 library, to Android platforms. The resulting library, called SDSL-Mobile, implements space-efficient data structures, including wavelet trees, compressed suffix arrays, and bit vectors, which are essential for handling large datasets in domains such as bioinformatics and information retrieval. Although originally designed for desktop environments, the library is extended to Android using the Android Native Development Kit (NDK) to enable integration into mobile platforms. Functionality is evaluated by implementing wavelet forests within an Android application, and performance is compared against a desktop implementation. The results demonstrate the feasibility of deploying succinct data structures on mobile devices, highlighting new possibilities for advanced data processing in resource-constrained environments.

  • Accurate short-read alignment through<i>r</i>-index-based pangenome indexing

    Genome Research · 2025-06-12 · 2 citations

    articleOpen accessSenior author

    Aligning to a linear reference genome can result in a higher percentage of reads going unmapped or being incorrectly mapped owing to variations not captured by the reference, otherwise known as reference bias. Recently, in efforts to mitigate reference bias, there has been a movement to switch to using pangenomes, a collection of genomes, as the reference. In this paper, we introduce Moni-align, the first short-read pangenome aligner built on the r -index, a variation of the classical FM-index that can index collections of genomes in O( r )-space, where r is the number of runs in the Burrows–Wheeler transform. Moni-align uses a seed-and-extend strategy for aligning reads, utilizing maximal exact matches as seeds, which can be efficiently obtained with the r -index. Using both simulated and real short-read data sets, we demonstrate that Moni-align achieves alignment accuracy comparable to vg map and vg giraffe, the leading pangenome aligners. Although currently best suited for aligning to localized pangenomes owing to computational constraints, Moni-align offers a robust foundation for future optimizations that could further broaden its applicability.

  • Pangenome-Scale Machine Learning Advances Resistance Prediction and Uncovers Novel Mycobacterium Tuberculosis Mechanisms

    SSRN Electronic Journal · 2025-01-01

    preprintOpen accessSenior author

Recent grants

Frequent coauthors

  • Travis Gagie

    Dalhousie University

    41 shared
  • Mattia Prosperi

    University of Florida

    29 shared
  • Noelle Noyes

    University of Minnesota

    25 shared
  • Massimiliano Rossi

    Illumina (United States)

    24 shared
  • Giovanni Manzini

    19 shared
  • Marco Antônio Oliva

    University of Florida

    19 shared
  • Alan Kuhnle

    16 shared
  • Ben Langmead

    Johns Hopkins University

    15 shared

Awards & honors

  • UF Term Professorship, 2021
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Christina Boucher

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup