
Anastasios Kyrillidis
· Noah Harding Associate Professor, Computer Science Member, Ken Kennedy InstituteVerifiedRice University · Computer Science
Active 2011–2026
About
Anastasios Kyrillidis is currently a Noah Harding Associate Professor at the Computer Science department at Rice University. His research areas include optimization for machine learning, convex and non-convex algorithms and analysis, and large-scale optimization, focusing on problems that involve a math-driven criterion and require efficient solutions. He completed his PhD at the CS Department of EPFL in Switzerland and has held positions as a Goldstine PostDoctoral Fellow at IBM T. J. Watson Research Center and as a Simons Foundation PostDoc member at the University of Texas at Austin. His educational background includes a Master's Degree and a Diploma in Electrical and Electronics Engineering from the Technical University of Crete. His honors include a Goldstine fellowship at IBM and a scholarship from the Simons Foundation for his PostDoc studies.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Physics
- Computational biology
- Data science
- Distributed computing
- Biology
- Computer network
Selected publications
A Catalyst Framework for the Quantum Linear System Problem via the Proximal Point Algorithm
Proceedings of the AAAI Conference on Artificial Intelligence · 2026-03-14
articleOpen accessSenior authorSolving systems of linear equations is a fundamental problem, but it can be computationally intensive for classical algorithms in high dimensions. Existing quantum algorithms can achieve exponential speedups for the quantum linear system problem (QLSP) in terms of the problem dimension, but the advantage is bottlenecked by condition number of the coefficient matrix. In this work, we propose a new quantum algorithm for QLSP inspired by the classical proximal point algorithm (PPA). Our proposed method can be viewed as a meta-algorithm that allows inverting a modified matrix via an existing QLSP solver, thereby directly approximating the solution vector instead of approximating the inverse of the coefficient matrix. By carefully choosing the step size eta, the proposed algorithm can effectively precondition the linear system to mitigate the dependence on condition numbers that hindered the applicability of previous approaches. Importantly, this is the first iterative framework for QLSP where a tunable parameter eta and initialization x_0 allows controlling the trade-off between the runtime and approximation error.
arXiv (Cornell University) · 2026-02-11
articleOpen accessSenior authorWhile Mamba2's expanded state dimension enhances temporal modeling, it incurs substantial inference overhead that saturates bandwidth during autoregressive generation. Standard pruning methods fail to address this bottleneck: unstructured sparsity leaves activations dense, magnitude-based selection ignores runtime dynamics, and gradient-based methods impose prohibitive costs. We introduce GHOST (Grouped Hidden-state Output-aware Selection and Truncation), a structured pruning framework that approximates control-theoretic balanced truncation using only forward-pass statistics. By jointly measuring controllability and observability, GHOST rivals the fidelity of gradient-based methods without requiring backpropagation. As a highlight, on models ranging from 130M to 2.7B parameters, our approach achieves a 50\% state-dimension reduction with approximately 1 perplexity point increase on WikiText-2. Code is available at https://anonymous.4open.science/r/mamba2_ghost-7BCB/.
Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking
ArXiv.org · 2026-02-19
articleOpen accessSenior authorWe investigate the convergence guarantee of two-layer neural network training with Gaussian randomly masked inputs. This scenario corresponds to Gaussian dropout at the input level, or noisy input training common in sensor networks, privacy-preserving training, and federated learning, where each user may have access to partial or corrupted features. Using a Neural Tangent Kernel (NTK) analysis, we demonstrate that training a two-layer ReLU network with Gaussian randomly masked inputs achieves linear convergence up to an error region proportional to the mask's variance. A key technical contribution is resolving the randomness within the non-linear activation, a problem of independent interest.
Exploiting Low-Rank Structure in Max-K-Cut Problems
Open MIND · 2026-02-23
preprintSenior authorWe approach the Max-3-Cut problem through the lens of maximizing complex-valued quadratic forms and demonstrate that low-rank structure in the objective matrix can be exploited, leading to alternative algorithms to classical semidefinite programming (SDP) relaxations and heuristic techniques. We propose an algorithm for maximizing these quadratic forms over a domain of size $K$ that enumerates and evaluates a set of $O\left(n^{2r-1}\right)$ candidate solutions, where $n$ is the dimension of the matrix and $r$ represents the rank of an approximation of the objective. We prove that this candidate set is guaranteed to include the exact maximizer when $K=3$ (corresponding to Max-3-Cut) and the objective is low-rank, and provide approximation guarantees when the objective is a perturbation of a low-rank matrix. This construction results in a family of novel, inherently parallelizable and theoretically-motivated algorithms for Max-3-Cut. Extensive experimental results demonstrate that our approach achieves performance comparable to existing algorithms across a wide range of graphs, while being highly scalable.
Convergence Analysis of Two-Layer Neural Networks under Gaussian Input Masking
Open MIND · 2026-02-19
preprintSenior authorWe investigate the convergence guarantee of two-layer neural network training with Gaussian randomly masked inputs. This scenario corresponds to Gaussian dropout at the input level, or noisy input training common in sensor networks, privacy-preserving training, and federated learning, where each user may have access to partial or corrupted features. Using a Neural Tangent Kernel (NTK) analysis, we demonstrate that training a two-layer ReLU network with Gaussian randomly masked inputs achieves linear convergence up to an error region proportional to the mask's variance. A key technical contribution is resolving the randomness within the non-linear activation, a problem of independent interest.
Exploiting Low-Rank Structure in Max-K-Cut Problems
ArXiv.org · 2026-02-23
articleOpen accessSenior authorWe approach the Max-3-Cut problem through the lens of maximizing complex-valued quadratic forms and demonstrate that low-rank structure in the objective matrix can be exploited, leading to alternative algorithms to classical semidefinite programming (SDP) relaxations and heuristic techniques. We propose an algorithm for maximizing these quadratic forms over a domain of size $K$ that enumerates and evaluates a set of $O\left(n^{2r-1}\right)$ candidate solutions, where $n$ is the dimension of the matrix and $r$ represents the rank of an approximation of the objective. We prove that this candidate set is guaranteed to include the exact maximizer when $K=3$ (corresponding to Max-3-Cut) and the objective is low-rank, and provide approximation guarantees when the objective is a perturbation of a low-rank matrix. This construction results in a family of novel, inherently parallelizable and theoretically-motivated algorithms for Max-3-Cut. Extensive experimental results demonstrate that our approach achieves performance comparable to existing algorithms across a wide range of graphs, while being highly scalable.
Open MIND · 2026-02-11
preprintSenior authorWhile Mamba2's expanded state dimension enhances temporal modeling, it incurs substantial inference overhead that saturates bandwidth during autoregressive generation. Standard pruning methods fail to address this bottleneck: unstructured sparsity leaves activations dense, magnitude-based selection ignores runtime dynamics, and gradient-based methods impose prohibitive costs. We introduce GHOST (Grouped Hidden-state Output-aware Selection and Truncation), a structured pruning framework that approximates control-theoretic balanced truncation using only forward-pass statistics. By jointly measuring controllability and observability, GHOST rivals the fidelity of gradient-based methods without requiring backpropagation. As a highlight, on models ranging from 130M to 2.7B parameters, our approach achieves a 50\% state-dimension reduction with approximately 1 perplexity point increase on WikiText-2. Code is available at https://anonymous.4open.science/r/mamba2_ghost-7BCB/.
Three Birds with One Stone: Improving Performance, Convergence, and System Throughput with NEST
Proceedings of the ACM on Measurement and Analysis of Computing Systems · 2025-12-01
articleOpen accessVariational quantum algorithms (VQAs) have the potential to demonstrate quantum utility on near-term quantum computers. However, these algorithms often get executed on the highest-fidelity qubits and computers to achieve the best performance, causing low system throughput. Recent efforts have shown that VQAs can be run on low-fidelity qubits initially and high-fidelity qubits later on to still achieve good performance. We take this effort forward and show that carefully varying the qubit fidelity map of the VQA over its execution using our technique, Nest, does not just (1) improve performance (i.e., help achieve close to optimal results), but also (2) lead to faster convergence. We also use Nest to co-locate multiple VQAs concurrently on the same computer, thus (3) increasing the system throughput, and therefore, balancing and optimizing three conflicting metrics simultaneously.
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
ArXiv.org · 2025-04-23
preprintOpen accessWe study whether Large Language Models (LLMs) inherently capture domain-specific nuances in natural language. Our experiments probe the domain sensitivity of LLMs by examining their ability to distinguish queries from different domains using hidden states generated during the prefill phase. We reveal latent domain-related trajectories that indicate the model's internal recognition of query domains. We also study the robustness of these domain representations to variations in prompt styles and sources. Our approach leverages these representations for model selection, mapping the LLM that best matches the domain trace of the input query (i.e., the model with the highest performance on similar traces). Our findings show that LLMs can differentiate queries for related domains, and that the fine-tuned model is not always the most accurate. Unlike previous work, our interpretations apply to both closed and open-ended generative tasks
Acta Crystallographica Section D Structural Biology · 2025-11-19 · 1 citations
articleOpen accessProtein structure determination has long been one of the primary challenges of structural biology, to which deep machine learning (ML)-based approaches have increasingly been applied. However, these ML models generally do not directly incorporate the experimental measurements, such as X-ray crystallographic diffraction data. To this end, we explore an approach that more tightly couples these traditional crystallographic and recent ML-based methods by training a hybrid 3D vision transformer and convolutional network on inputs from both domains. We make use of two distinct input constructs: Patterson maps, which are directly obtainable from crystallographic data, and `partial structure' template maps derived from predicted structures deposited in the AlphaFold Protein Structure Database with subsequently omitted residues. With these, we predict electron-density maps that are then post-processed into atomic models through standard crystallographic refinement processes. Introducing an initial data set of small protein fragments taken from Protein Data Bank entries and placing them in hypothetical crystal settings, we demonstrate that our method is effective at both improving the phases of the crystallographic structure factors and completing the regions missing from partial structure templates, as well as improving the agreement of the electron-density maps with the ground-truth atomic structures.
Recent grants
FET: Small: Collaborative Research: Efficient and Robust Characterization of Quantum Systems
NSF · $470k · 2019–2023
CAREER: Algorithmic foundations for practical acceleration in computational sciences
NSF · $658k · 2022–2027
Frequent coauthors
- 50 shared
Volkan Cevher
- 21 shared
Sujay Sanghavi
- 18 shared
Constantine Caramanis
- 17 shared
C. Wolfe
Rice University
- 13 shared
Chen Dun
Johns Hopkins University
- 13 shared
John Chen
California Polytechnic State University
- 13 shared
Dohyung Park
Samsung (South Korea)
- 12 shared
Junhyung Lyle Kim
Education
- 2014
Ph.D. in Communications and Computer Science, Communications and Computer Science
École Polytechnique Fédérale de Lausanne
- 2010
M.Sc. in Computer Science, Department of Electrical and Computer Engineering
Technical University of Crete
- 2008
Diploma, Department of Electrical and Computer Engineering
Technical University of Crete
Awards & honors
- Goldstine fellowship at IBM among more than 100 applicants
- Simons Foundation scholarship for PostDoc studies at UT Aust…
- Alexander S. Onassis Public Benefit Foundation
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Anastasios Kyrillidis
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup