
Vince Lyzinski
· Associate ProfessorVerifiedUniversity of Maryland, College Park · Statistics
Active 2011–2026
About
Vince Lyzinski is a Professor in the Department of Mathematics at the University of Maryland, College Park, with additional affiliations in the Statistics Program and the Applied Mathematics, Statistics, and Scientific Computation program. He received his BSc degree in mathematics from the University of Notre Dame in 2006 and began graduate studies in pure mathematics at Johns Hopkins University in 2006 before shifting to applied mathematics and statistics. He earned his Ph.D. in Applied Mathematics and Statistics from Johns Hopkins University in 2013 under the supervision of Professor Jim Fill. His dissertation focused on intertwinings, interlacing eigenvalues, and strong stationary duality for diffusions. After completing his Ph.D., he held a postdoctoral position and a senior research scientist role at Johns Hopkins University, followed by an assistant professorship at the University of Massachusetts, Amherst, before joining the University of Maryland faculty in 2019. Professor Lyzinski's research interests include statistical network inference, graph matching theory and algorithms, statistical machine learning, Markov chains, probability, and combinatorics. His work has been supported by agencies such as AFOSR, DARPA, NIH, and the Johns Hopkins Human Language Technology Center of Excellence. His research contributions span areas such as network de-anonymization, vertex nomination, clustering, testing, classification in networks, and the development of novel algorithms and theoretical frameworks for graph matching and network inference. He has advised numerous Ph.D. students and postdoctoral researchers, contributing to the advancement of knowledge in applied mathematics and statistics related to network data analysis.
Research topics
- Computer science
- Mathematics
- Combinatorics
- Theoretical computer science
- Algorithm
Selected publications
Vertex misalignment and changepoint localization in network time series
arXiv (Cornell University) · 2026-04-22
articleOpen accessInference for time series of networks often relies on accurate vertex correspondence between network realizations at different times. In practice, however, such vertex alignments can be misspecified or unknown. We study the impact of vertex alignment on changepoint localization for dynamic networks through two illustrative models, each with a similar changepoint, with the key distinction being whether changepoint information is contained in marginal or joint distributions of the time-varying latent positions. We compare localization techniques ranging from the simple network statistic of average degree to the modern procedure of Euclidean mirrors. In one model, vertex misalignment causes little error, and in the other, it impairs localization in ways that cannot be corrected through graph matching or optimal transport, which we show are closely related in this setting. Our results demonstrate that robust network inference necessitates reckoning with the subtle interplay of marginal and joint information in the observed network time series.
Matching and mixing: Matchability of graphs under Markovian error
ArXiv.org · 2026-01-27
articleOpen accessSenior authorWe consider the problem of graph matching for a sequence of graphs generated under a time-dependent Markov chain noise model. Our edgelighter error model, a variant of the classical lamplighter random walk, iteratively corrupts the graph $G_0$ with edge-dependent noise, creating a sequence of noisy graph copies $(G_t)$. Much of the graph matching literature is focused on anonymization thresholds in edge-independent noise settings, and we establish novel anonymization thresholds in this edge-dependent noise setting when matching $G_0$ and $G_t$. Moreover, we also compare this anonymization threshold with the mixing properties of the Markov chain noise model. We show that when $G_0$ is drawn from an Erdős-Rényi model, the graph matching anonymization threshold and the mixing time of the edgelighter walk are both of order $Θ(n^2\log n)$. We further demonstrate that for more structured model for $G_0$ (e.g., the Stochastic Block Model), graph matching anonymization can occur in $O(n^α\log n)$ time for some $α<2$, indicating that anonymization can occur before the Markov chain noise model globally mixes. Through extensive simulations, we verify our theoretical bounds in the settings of Erdős-Rényi random graphs and stochastic block model random graphs, and explore our findings on real-world datasets derived from a Facebook friendship network and a European research institution email communication network.
Data Kernel Perspective Space Performance Guarantees for Synthetic Data from Transformer Models
Open MIND · 2026-02-04
preprintScarcity of labeled training data remains the long pole in the tent for building performant language technology and generative AI models. Transformer models -- particularly LLMs -- are increasingly being used to mitigate the data scarcity problem via synthetic data generation. However, because the models are black boxes, the properties of the synthetic data are difficult to predict. In practice it is common for language technology engineers to 'fiddle' with the LLM temperature setting and hope that what comes out the other end improves the downstream model. Faced with this uncertainty, here we propose Data Kernel Perspective Space (DKPS) to provide the foundation for mathematical analysis yielding concrete statistical guarantees for the quality of the outputs of transformer models. We first show the mathematical derivation of DKPS and how it provides performance guarantees. Next we show how DKPS performance guarantees can elucidate performance of a downstream task, such as neural machine translation models or LLMs trained using Contrastive Preference Optimization (CPO). Limitations of the current work and future research are also discussed.
Data Kernel Perspective Space Performance Guarantees for Synthetic Data from Transformer Models
arXiv (Cornell University) · 2026-02-04
articleOpen accessScarcity of labeled training data remains the long pole in the tent for building performant language technology and generative AI models. Transformer models -- particularly LLMs -- are increasingly being used to mitigate the data scarcity problem via synthetic data generation. However, because the models are black boxes, the properties of the synthetic data are difficult to predict. In practice it is common for language technology engineers to 'fiddle' with the LLM temperature setting and hope that what comes out the other end improves the downstream model. Faced with this uncertainty, here we propose Data Kernel Perspective Space (DKPS) to provide the foundation for mathematical analysis yielding concrete statistical guarantees for the quality of the outputs of transformer models. We first show the mathematical derivation of DKPS and how it provides performance guarantees. Next we show how DKPS performance guarantees can elucidate performance of a downstream task, such as neural machine translation models or LLMs trained using Contrastive Preference Optimization (CPO). Limitations of the current work and future research are also discussed.
Matching and mixing: Matchability of graphs under Markovian error
Open MIND · 2026-01-27
preprintSenior authorWe consider the problem of graph matching for a sequence of graphs generated under a time-dependent Markov chain noise model. Our edgelighter error model, a variant of the classical lamplighter random walk, iteratively corrupts the graph $G_0$ with edge-dependent noise, creating a sequence of noisy graph copies $(G_t)$. Much of the graph matching literature is focused on anonymization thresholds in edge-independent noise settings, and we establish novel anonymization thresholds in this edge-dependent noise setting when matching $G_0$ and $G_t$. Moreover, we also compare this anonymization threshold with the mixing properties of the Markov chain noise model. We show that when $G_0$ is drawn from an Erdős-Rényi model, the graph matching anonymization threshold and the mixing time of the edgelighter walk are both of order $Θ(n^2\log n)$. We further demonstrate that for more structured model for $G_0$ (e.g., the Stochastic Block Model), graph matching anonymization can occur in $O(n^α\log n)$ time for some $α<2$, indicating that anonymization can occur before the Markov chain noise model globally mixes. Through extensive simulations, we verify our theoretical bounds in the settings of Erdős-Rényi random graphs and stochastic block model random graphs, and explore our findings on real-world datasets derived from a Facebook friendship network and a European research institution email communication network.
Optimizing the Induced Correlation in Omnibus Joint Graph Embeddings
Figshare · 2026-04-02
articleOpen accessSenior authorTheoretical and empirical evidence suggests that joint graph embedding algorithms induce correlation across networks in the embedding space. In the Omnibus joint graph embedding framework, previous results delineated the dual effects of algorithm-induced and model-inherent correlations on the total correlation across embedded networks. Accounting for the algorithm-induced correlation is practically important, as suboptimal Omnibus constructions can lead to inferential losses. This work presents the first efforts to automate the Omnibus construction in order to address two key questions: the correlation–to–Omni problem and the flat correlation problem. In the flat correlation problem, we seek the minimum algorithm-induced flat correlation (i.e., the same across all graph pairs) produced via an Omnibus embedding, as minimal flat correlation best preserves individual graph structure in the embedding space. Working in a subspace of the fully general Omnibus matrices, we prove both a lower bound for this flat correlation and that the classical Omnibus construction induces maximal flat correlation. In the correlation–to–Omni problem, we present the corr2Omni algorithm to construct Omnibus embeddings that best preserve a given matrix of estimated pairwise graph correlations in the embedding space. In simulated and real data settings, we demonstrate the increased effectiveness of corr2Omni versus the classical Omnibus construction.
Optimizing the Induced Correlation in Omnibus Joint Graph Embeddings
Journal of Computational and Graphical Statistics · 2026-04-02
articleSenior authorCorrespondingVertex misalignment and changepoint localization in network time series
arXiv (Cornell University) · 2026-04-22
preprintOpen accessInference for time series of networks often relies on accurate vertex correspondence between network realizations at different times. In practice, however, such vertex alignments can be misspecified or unknown. We study the impact of vertex alignment on changepoint localization for dynamic networks through two illustrative models, each with a similar changepoint, with the key distinction being whether changepoint information is contained in marginal or joint distributions of the time-varying latent positions. We compare localization techniques ranging from the simple network statistic of average degree to the modern procedure of Euclidean mirrors. In one model, vertex misalignment causes little error, and in the other, it impairs localization in ways that cannot be corrected through graph matching or optimal transport, which we show are closely related in this setting. Our results demonstrate that robust network inference necessitates reckoning with the subtle interplay of marginal and joint information in the observed network time series.
Optimizing the Induced Correlation in Omnibus Joint Graph Embeddings
Figshare · 2026-04-02
articleOpen accessSenior authorTheoretical and empirical evidence suggests that joint graph embedding algorithms induce correlation across networks in the embedding space. In the Omnibus joint graph embedding framework, previous results delineated the dual effects of algorithm-induced and model-inherent correlations on the total correlation across embedded networks. Accounting for the algorithm-induced correlation is practically important, as suboptimal Omnibus constructions can lead to inferential losses. This work presents the first efforts to automate the Omnibus construction in order to address two key questions: the correlation–to–Omni problem and the flat correlation problem. In the flat correlation problem, we seek the minimum algorithm-induced flat correlation (i.e., the same across all graph pairs) produced via an Omnibus embedding, as minimal flat correlation best preserves individual graph structure in the embedding space. Working in a subspace of the fully general Omnibus matrices, we prove both a lower bound for this flat correlation and that the classical Omnibus construction induces maximal flat correlation. In the correlation–to–Omni problem, we present the corr2Omni algorithm to construct Omnibus embeddings that best preserve a given matrix of estimated pairwise graph correlations in the embedding space. In simulated and real data settings, we demonstrate the increased effectiveness of corr2Omni versus the classical Omnibus construction.
IEEE Transactions on Pattern Analysis and Machine Intelligence · 2025-12-12
articleSenior authorIn this article, we explore the capability of both the Adjacency Spectral Embedding (ASE) and the Graph Encoder Embedding (GEE) for capturing an embedded pseudo-clique structure in the random dot product graph setting. In both theory and experiments, we demonstrate that, in the absence of additional clean (i.e., without the implanted pseudo-clique) network data, this pairing of model and methods can yield worse results than the best existing spectral clique detection methods. However, these methods can be used to asymptotically localize the pseudo-cliques if additional clean, independent network data is provided. This demonstrates at once the methods' potential ability/inability to capture modestly sized pseudo-cliques and the methods' robustness to the model contamination giving rise to the pseudo-clique structure. To further enrich our analysis, we also consider the Variational Graph Auto-Encoder (VGAE) model in our simulation and real data experiments.
Frequent coauthors
- 106 shared
Carey E. Priebe
- 38 shared
Youngser Park
- 36 shared
Donniell E. Fishkind
- 34 shared
Daniel L. Sussman
Boston University
- 33 shared
Avanti Athreya
- 29 shared
Minh Tang
- 18 shared
Keith Levin
University of Wisconsin–Madison
- 17 shared
Joshua T Vogelstein
Johns Hopkins University
Labs
Statistical network inference, graph matching, statistical machine learning, Markov chains, probability, and combinatorics
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Vince Lyzinski
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup