
Andrew Nobel
· Distinguished ProfessorVerifiedUniversity of North Carolina at Chapel Hill · Statistics
Active 1991–2026
About
Andrew B. Nobel is the Paul Ziff Distinguished Professor in the Department of Statistics and Operations Research at the University of North Carolina at Chapel Hill. His professional contact details include an office in 308 Hanes Building, with telephone and fax numbers provided for direct communication. Professor Nobel's research interests encompass several advanced areas in statistics and data science, including Machine Learning and Data Mining, Statistical Genomics, Analysis of Networks, and Inference from Dynamical Systems. In addition to his primary academic appointment, he holds significant roles as a Full Member of the UNC Lineberger Comprehensive Cancer Center and as a Member of the UNC Computational Genomics Program. These appointments highlight his interdisciplinary engagement and contributions to computational and genomic research within the university.
Research topics
- Biology
- Genetics
- Computational biology
- Medicine
- Cell biology
- Evolutionary biology
- Pathology
Selected publications
Graph Disjointness with Applications to Reversible Markov Chains
ArXiv.org · 2026-03-03
articleOpen accessSenior authorThe correspondence between weighted undirected graphs and reversible Markov chains via vertex random walks is simple and well known. Leveraging this correspondence and ideas from the theory of dynamical systems, we study the structural discordance of graphs and Markov chains by means of graph joinings. Informally, a joining of graphs $G$ and $H$ is a graph on the product of their vertex sets giving rise to a coupling of their random walks. Graphs $G$ and $H$ are strongly disjoint if their only joining is the tensor product, and they are weakly disjoint if the degree function of every joining is equal to the degree function of the tensor product. We establish close connections between graph joinings, disjointness, and graph factors. Our first principal result characterizes weak disjointness of graphs in terms of the spectral overlap of their Markov transition matrices. The second establishes that two graphs without self loops are strongly disjoint if and only if they are weakly disjoint and exactly one of the graphs is a tree. The third shows that the strong or weak disjointness of graphs is essentially determined by their vertex and edge sets, without regard to edge weights. Translating these results into the language of Markov chains yields new insights into the rigidity and structure of reversible couplings of reversible Markov chains.
A testing based approach to the discovery of differentially correlated variable sets
UNC Libraries · 2026-04-07
articleOpen access1st authorCorrespondingGiven data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as to recent experimental datasets in genomics and brain imaging.
UNC Libraries · 2025-11-14
articleOpen accessFew predictors of response to topical corticosteroid (tCS) treatment have been identified in eosinophilic esophagitis (EoE). We aimed to determine whether baseline gene expression predicts histologic response to tCS treatment for EoE. We analyzed prospectively collected samples from incident EoE cases who were treated with tCS for 8 weeks in a development cohort (prospective study) or in an independent validation cohort (clinical trial). Whole transcriptome RNA expression was determined from a baseline (pre-treatment) RNA-later preserved esophageal biopsy. Baseline expression was compared between histologic responders (<15 eos/hpf) and non-responders (≥15 eos/hpf), and differential correlation was used to assess baseline gene expression by response status. In 87 EoE cases analyzed in the development set, there were no differentially expressed genes associated with treatment response (at false discovery rate = 0.1). However, differential correlation identified a module of 22 genes with statistically significantly high pairwise correlation in non-responders (mean correlation coefficient = 0.7) compared to low correlation in responders (coefficient = 0.3). When this 22-gene module was applied to the 89 EoE cases in the independent cohort, it was not validated to predict tCS response at the 15 eos/hpf threshold (mean correlation coefficient = 0.32 in responders and 0.25 in nonresponders). Exploration of other thresholds also did not validate any modules. Though we identified a 22 gene differential correlation module measured pre-treatment that was strongly associated with subsequent histologic response to tCS in EoE, this was not validated in an independent population. Alternative methods to predict steroid response should be explored.
Alignment and comparison of directed networks via transition couplings of random walks
Journal of the Royal Statistical Society Series B (Statistical Methodology) · 2024-07-29
articleSenior authorAbstract We describe and study a transport-based procedure called network optimal transition coupling (NetOTC) for the comparison and alignment of two networks. The networks of interest may be directed or undirected, weighted or unweighted, and may have distinct vertex sets of different sizes. Given two networks and a cost function relating their vertices, NetOTC finds a transition coupling of their associated random walks having minimum expected cost. The minimizing cost quantifies the difference between the networks, while the optimal transport plan itself provides alignments of both the vertices and the edges of the two networks. Coupling of the full random walks, rather than their marginal distributions, ensures that NetOTC captures local and global information about the networks and preserves edges. NetOTC has no free parameters and does not rely on randomization. We investigate a number of theoretical properties of NetOTC and present experiments establishing its empirical performance.
Community Extraction in Multilayer Networks with Heterogeneous Community Structure.
Europe PMC (PubMed Central) · 2024-07-27 · 33 citations
article1st authorCorrespondingMultilayer networks are a useful way to capture and model multiple, binary or weighted relationships among a fixed group of objects. While community detection has proven to be a useful exploratory technique for the analysis of single-layer networks, the development of community detection methods for multilayer networks is still in its infancy. We propose and investigate a procedure, called Multilayer Extraction, that identifies densely connected vertex-layer sets in multilayer networks. Multilayer Extraction makes use of a significance based score that quantifies the connectivity of an observed vertex-layer set through comparison with a fixed degree random graph model. Multilayer Extraction directly handles networks with heterogeneous layers where community structure may be different from layer to layer. The procedure can capture overlapping communities, as well as background vertex-layer pairs that do not belong to any community. We establish consistency of the vertex-layer set optimizer of our proposed multilayer score under the multilayer stochastic block model. We investigate the performance of Multilayer Extraction on three applications and a test bed of simulations. Our theoretical and numerical evaluations suggest that Multilayer Extraction is an effective exploratory tool for analyzing complex multilayer networks. Publicly available code is available at https://github.com/jdwilson4/MultilayerExtraction.
ACTOR: a latent Dirichlet model to compare expressed isoform proportions to a reference panel.
UNC Libraries · 2024-07-27
articleOpen accessThe relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public data sets produced by genomic consortia as a reference, one can compare splicing patterns in a data set of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. We propose A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel (ACTOR), a latent Dirichlet model with Dirichlet Multinomial observations to compare expressed isoform proportions in a data set to an independent reference panel. We use a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression project as a reference data set, we evaluate ACTOR on simulated and real RNA-seq data sets to determine tissue-type classifications of genes. ACTOR is publicly available as an R package at https://github.com/mccabes292/actor.
Network Neuroscience · 2024-01-01 · 4 citations
articleOpen accessSenior authorDespite the widespread exploration and availability of parcellations for the functional connectome, parcellations designed for the structural connectome are comparatively limited. Current research suggests that there may be no single "correct" parcellation and that the human brain is intrinsically a multiresolution entity. In this work, we propose the Continuous Structural Connectivitity-based, Nested (CoCoNest) family of parcellations-a fully data-driven, multiresolution family of parcellations derived from structural connectome data. The CoCoNest family is created using agglomerative (bottom-up) clustering and error-complexity pruning, which strikes a balance between the complexity of each parcellation and how well it preserves patterns in vertex-level, high-resolution connectivity data. We draw on a comprehensive battery of internal and external evaluation metrics to show that the CoCoNest family is competitive with or outperforms widely used parcellations in the literature. Additionally, we show how the CoCoNest family can serve as an exploratory tool for researchers to investigate the multiresolution organization of the structural connectome.
Community modulated recursive trees and population dependent branching processes
UNC Libraries · 2024-08-29 · 1 citations
articleOpen access1st authorCorrespondingWe consider random recursive trees that are grown via community modulated schemes that involve random attachment or degree based attachment. The aim of this article is to derive general techniques based on continuous time embedding to study such models. The associated continuous time embeddings are not branching processes: individual reproductive rates at each time t depend on the composition of the entire population at that time, and hence vertices do not reproduce independently. Using stochastic analytic techniques we show that various key macroscopic statistics of the continuous time embedding stabilize, allowing asymptotics for a host of functionals of the original models to be derived.
Estimation of stationary optimal transport plans
Information and Inference A Journal of the IMA · 2024-04-01 · 2 citations
articleOpen accessAbstract We study optimal transport for stationary stochastic processes taking values in finite spaces. In order to reflect the stationarity of the underlying processes, we restrict attention to stationary couplings, also known as joinings. The resulting optimal joining problem captures differences in the long-run average behavior of the processes of interest. We introduce estimators of both optimal joinings and the optimal joining cost, and establish consistency of the estimators under mild conditions. Furthermore, under stronger mixing assumptions we establish finite-sample error rates for the estimated optimal joining cost that extend the best known results in the iid case. We also extend the consistency and rate analysis to an entropy-penalized version of the optimal joining problem. Finally, we validate our convergence results empirically as well as demonstrate the computational advantage of the entropic problem in a simulation experiment.
Control of false discoveries in grouped hypothesis testing for eQTL data
BMC Bioinformatics · 2024-04-11
articleOpen accessBACKGROUND: Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches. RESULTS: In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed. CONCLUSION: Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.
Recent grants
NIH · $791k · 2020
Significance Based Procedures for Mining and Prediction of Large Data Sets
NSF · $210k · 2009–2013
NIH · $1.3M · 2017
Optimality Landscapes and Exploratory Data Analysis
NSF · $270k · 2013–2017
Inference for Stationary Processes: Optimal Transport and Generalized Bayesian Approaches
NSF · $300k · 2021–2025
Frequent coauthors
- 47 shared
Fred A. Wright
- 42 shared
Tuuli Lappalainen
Science for Life Laboratory
- 41 shared
Ayellet V. Segrè
Broad Institute
- 40 shared
Charles M. Perou
UNC Lineberger Comprehensive Cancer Center
- 39 shared
Gad Getz
- 38 shared
Roderic Guigó
Pompeu Fabra University
- 37 shared
Sarah Kim-Hellmuth
- 35 shared
Joel S. Parker
University of North Carolina at Chapel Hill
Awards & honors
- Andrew B. Nobel Distinguished Professor
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Andrew Nobel
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup