About
Erin Conlon is a professor in the Department of Mathematics and Statistics at the University of Massachusetts Amherst, specifically affiliated with the Newton Mount Ida Campus in the Boston area. She is involved in the Statistics Graduate Program, which offers a completely flexible Master's degree in Statistics with options for in-person evening classes and remote learning. The program is designed to accommodate students through flexible learning modalities, and the degree is awarded by the University of Massachusetts Amherst. Additionally, she is connected to the Boston-Area Data Science Certificate, a joint offering between Statistics and Computer Science that can be completed fully online. Professor Conlon's educational background includes a Ph.D. in Biostatistics and an M.S. in Biostatistics from the University of Minnesota, as well as a B.S. in Mathematics from the University of Wisconsin, Madison. Her professional history includes postdoctoral fellowships in the Department of Statistics at Harvard University and in Statistical Genetics at the University of Washington, Seattle, as well as a visiting scholar position in Functional Genomics at the Institute for Pure and Applied Mathematics at UCLA. Her research focuses on developing Bayesian statistical methods for data science, big data, and analytics. She collaborates with researchers such as Xiaojing Wang, Zheng Wei, and Alexey Miroshnikov in this area. Her interests also extend to statistical methods in genomics and bioinformatics, including gene expression and DNA sequence analysis, Bayesian models for genomic data, and comparative genomics. Current work includes systems-biology approaches to studying regulatory and metabolic networks of microbes, in collaboration with Kristen DeAngelis's lab, and statistical and bioinformatic methods for breast cancer gene expression studies with Joseph Jerry's lab. Other collaborative projects involve microbial organisms such as Prochlorococcus marinus, Geobacter, and Bacillus subtilis, working with researchers Jeffrey Blanchard, Derek Lovley, and Richard Losick. She has also developed software, including the R package parallelMCMCcombine, which supports Bayesian methods for big data and analytics.
Research topics
- Computer Science
- Mathematics
- Statistics
- Data Mining
- Machine Learning
- Artificial Intelligence
- Econometrics
- Algorithm
- Applied mathematics
- Economics
Selected publications
A Bayesian approach to the analysis of asymmetric association for two-way contingency tables
Computational Statistics · 2021 · 1 citations
Senior authorCorresponding- Computer Science
- Data Mining
- Computer Science
Asymmetric dependence in the stochastic frontier model using skew normal copula
International Journal of Approximate Reasoning · 2020 · 13 citations
- Computer Science
- Econometrics
- Mathematics
Gene expression signature of atypical breast hyperplasia and regulation by SFRP1
Breast Cancer Research · 2019-06-27 · 27 citations
articleOpen accessBACKGROUND: Atypical breast hyperplasias (AH) have a 10-year risk of progression to invasive cancer estimated at 4-7%, with the overall risk of developing breast cancer increased by ~ 4-fold. AH lesions are estrogen receptor alpha positive (ERα+) and represent risk indicators and/or precursor lesions to low grade ERα+ tumors. Therefore, molecular profiles of AH lesions offer insights into the earliest changes in the breast epithelium, rendering it susceptible to oncogenic transformation. METHODS: In this study, women were selected who were diagnosed with ductal or lobular AH, but no breast cancer prior to or within the 2-year follow-up. Paired AH and histologically normal benign (HNB) tissues from patients were microdissected. RNA was isolated, amplified linearly, labeled, and hybridized to whole transcriptome microarrays to determine gene expression profiles. Genes that were differentially expressed between AH and HNB were identified using a paired analysis. Gene expression signatures distinguishing AH and HNB were defined using AGNES and PAM methods. Regulation of gene networks was investigated using breast epithelial cell lines, explant cultures of normal breast tissue and mouse tissues. RESULTS: A 99-gene signature discriminated the histologically normal and AH tissues in 81% of the cases. Network analysis identified coordinated alterations in signaling through ERα, epidermal growth factor receptors, and androgen receptor which were associated with the development of both lobular and ductal AH. Decreased expression of SFRP1 was also consistently lower in AH. Knockdown of SFRP1 in 76N-Tert cells resulted altered expression of 13 genes similarly to that observed in AH. An SFRP1-regulated network was also observed in tissues from mice lacking Sfrp1. Re-expression of SFRP1 in MCF7 cells provided further support for the SFRP1-regulated network. Treatment of breast explant cultures with rSFRP1 dampened estrogen-induced progesterone receptor levels. CONCLUSIONS: The alterations in gene expression were observed in both ductal and lobular AH suggesting shared underlying mechanisms predisposing to AH. Loss of SFRP1 expression is a significant regulator of AH transcriptional profiles driving previously unidentified changes affecting responses to estrogen and possibly other pathways. The gene signature and pathways provide insights into alterations contributing to AH breast lesions.
Figshare · 2019-01-01
datasetOpen accessTable S1. Probesets that are differentially expressed (1039 probesets). Table S2. Probesets selected by pâ <â 0.005 used for hierarchical clustering by AGNES (99 genes). Table S3. Probesets selected by PAM (139 genes). Table S4. Zero-order gene network. Table S5. Primers for RT-qPCR. (XLSX 204 kb)
Parallel Markov chain Monte Carlo for Bayesian hierarchical models with big data, in two stages
Journal of Applied Statistics · 2019-01-29 · 4 citations
articleSenior authorCorrespondingDue to the escalating growth of big data sets in recent years, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been developed. These methods partition large data sets by observations into subsets. However, for Bayesian nested hierarchical models, typically only a few parameters are common for the full data set, with most parameters being group specific. Thus, parallel Bayesian MCMC methods that take into account the structure of the model and split the full data set by groups rather than by observations are a more natural approach for analysis. Here, we adapt and extend a recently introduced two-stage Bayesian hierarchical modeling approach, and we partition complete data sets by groups. In stage 1, the group-specific parameters are estimated independently in parallel. The stage 1 posteriors are used as proposal distributions in stage 2, where the target distribution is the full model. Using three-level and four-level models, we show in both simulation and real data studies that results of our method agree closely with the full data analysis, with greatly increased MCMC efficiency and greatly reduced computation times. The advantages of our method versus existing parallel MCMC computing methods are also described.
Genome Sequence of <i>Verrucomicrobium</i> sp. Strain GAS474, a Novel Bacterium Isolated from Soil
Genome Announcements · 2018-01-24 · 11 citations
articleOpen accesssp. strain GAS474 was isolated from the mineral soil of a temperate deciduous forest in central Massachusetts. Here, we present the complete genome sequence of this phylogenetically novel organism, which consists of a total of 3,763,444 bp on a single scaffold, with a 65.8% GC content and 3,273 predicted open reading frames.
Parallel Markov chain Monte Carlo for Bayesian dynamic item response models in educational testing
Stat · 2017-01-01 · 3 citations
articleSenior authorCorrespondingBayesian dynamic item response models have been successfully used for educational testing data; these models are especially useful for individually varying and irregularly spaced longitudinal testing data. However, because of the complexity of the models and the large size of the data sets, computation time is excessive for carrying out full data analyses in practice. Here, we introduce a parallel Markov chain Monte Carlo method to speed the implementation of these Bayesian models. Using both simulation data and real educational testing data for reading ability, we demonstrate that computation time is greatly reduced for our parallel computing method versus full data analyses. The estimated error of our method is shown to be small, using common distance metrics. Our parallel computing approach can be used for other models in the Educational and Psychometric fields, including Bayesian item response theory models. Copyright © 2017 John Wiley & Sons, Ltd.
arXiv (Cornell University) · 2017-10-25
preprintOpen accessSenior authorIn this article we perform an asymptotic analysis of parallel Bayesian logspline density estimators. Such estimators are useful for the analysis of datasets that are partitioned into subsets and stored in separate databases without the capability of accessing the full dataset from a single computer. The parallel estimator we introduce is in the spirit of a kernel density estimator introduced in recent studies. We provide a numerical procedure that produces the normalized density estimator itself in place of the sampling algorithm. We then derive an error bound for the mean integrated squared error of the full dataset posterior estimator. The error bound depends upon the parameters that arise in logspline density estimation and the numerical approximation procedure. In our analysis, we identify the choices for the parameters that result in the error bound scaling optimally in relation to the number of samples. This provides our method with increased estimation accuracy, while also minimizing the computational cost.
arXiv (Cornell University) · 2017-10-25
preprintOpen accessSenior authorIn this article we perform an asymptotic analysis of Bayesian parallel density estimators which are based on logspline density estimation. The parallel estimator we introduce is in the spirit of a kernel density estimator introduced in recent studies. We provide a numerical procedure that produces the density estimator itself in place of the sampling algorithm. We then derive an error bound for the mean integrated squared error for the full data posterior density estimator. We also investigate the parameters that arise from logspline density estimation and the numerical approximation procedure. Our investigation identifies specific choices of parameters for logspline density estimation that result in the error bound scaling appropriately in relation to these choices.
Parallel Markov Chain Monte Carlo for Bayesian Hierarchical Models with Big Data, in Two Stages
arXiv (Cornell University) · 2017-12-16
preprintOpen accessSenior authorDue to the escalating growth of big data sets in recent years, new Bayesian Markov chain Monte Carlo (MCMC) parallel computing methods have been developed. These methods partition large data sets by observations into subsets. However, for Bayesian nested hierarchical models, typically only a few parameters are common for the full data set, with most parameters being group-specific. Thus, parallel Bayesian MCMC methods that take into account the structure of the model and split the full data set by groups rather than by observations are a more natural approach for analysis. Here, we adapt and extend a recently introduced two-stage Bayesian hierarchical modeling approach, and we partition complete data sets by groups. In stage 1, the group-specific parameters are estimated independently in parallel. The stage 1 posteriors are used as proposal distributions in stage 2, where the target distribution is the full model. Using three-level and four-level models, we show in both simulation and real data studies that results of our method agree closely with the full data analysis, with greatly increased MCMC efficiency and greatly reduced computation times. The advantages of our method versus existing parallel MCMC computing methods are also described.
Recent grants
NIH · $81k · 2003
Frequent coauthors
- 13 shared
Jun S. Liu
- 9 shared
Jason D. Lieb
University of Chicago
- 9 shared
X. Shirley Liu
G1 Therapeutics (United States)
- 6 shared
Zheng Wei
Texas A&M University – Corpus Christi
- 5 shared
Patrick Eichenberger
New York University
- 5 shared
Alexey Miroshnikov
Discover Financial Services (United States)
- 3 shared
Ellen M. Wijsman
University of Washington
- 3 shared
Richard Losick
Harvard University
Labs
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Erin Conlon
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup