
Ümit V. Çatalyürek
VerifiedGeorgia Institute of Technology · Computer Science
Active 1996–2025
About
Ümit V. Çatalyürek is a Professor in the School of Computational Science and Engineering in the College of Computing at the Georgia Institute of Technology. Prior to joining Georgia Tech, he was a Professor and Vice Chair of the Department of Biomedical Informatics, and a Professor in the Departments of Electrical & Computer Engineering, and Computer Science & Engineering at the Ohio State University. He received his Ph.D., M.S., and B.S. in Computer Engineering and Information Science from Bilkent University, Turkey, in 2000, 1994, and 1992, respectively. Dr. Çatalyürek is a Fellow of IEEE and SIAM, and has served as the elected Chair for IEEE TCPP for 2016-2019 and as Vice-Chair for ACM SIGBio for 2015-2021. He also serves as a member of the Board of Trustees of Bilkent University and as the Editor-in-Chief for Parallel Computing. His past editorial roles include positions on the editorial boards of IEEE Transactions on Parallel and Distributed Computing Systems, SIAM Journal of Scientific Computing, Journal of Parallel and Distributed Computing, and Network Modeling and Analysis in Health Informatics and Bioinformatics. Dr. Çatalyürek is a recipient of an NSF CAREER award and is the primary investigator of several awards from the Department of Energy, the National Institute of Health, and the National Science Foundation. His main research areas include parallel computing, combinatorial scientific computing, and biomedical informatics, and he has co-authored more than 200 peer-reviewed articles, invited book chapters, and papers.
Research topics
- Computer science
- Parallel computing
- Theoretical computer science
- Distributed computing
- Algorithm
Selected publications
A Scalable and Effective Alternative to Graph Transformers
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11 · 2 citations
articleOpen accessSenior authorGraph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasiliniear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that it scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.
A Scalable and Effective Alternative to Graph Transformers
arXiv (Cornell University) · 2024-06-17
preprintOpen accessSenior authorGraph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasilinear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that GECO scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.
Finding Temporal Dense Regions of Temporal Graphs
2023-03-01
articleOpen accessDense regions lend insight to static graphs Many graphs today are changing Can we get insight in temporal graphs?Example question: can we find a group in a company and track it through its lifetime, as it grows, shrinks, and changes?BLUF: We provide a new temporal dense region and show it captures important components of such changing graphs C. Elegans neurons, core in blue.Core has most important neurons
Optimizing Irregular Dense Operators of Heterogeneous GNN Models on GPU
2023-05-01 · 3 citations
articleGNN models on heterogeneous graphs have achieved state-of-the-art (SOTA) performance in various graph tasks such as link prediction and node classification. Despite their success in providing SOTA results, popular GNN libraries, such as PyG and DGL, fail to provide fast and efficient solutions for heterogeneous GNN models. One common key bottlenecks of models like RGAT, RGCN, and HGT is relation-specific linear projection. In this paper, we propose two high-performing tensor operators: gather-mm and segment-mm to address the issue. We demonstrate the effectiveness of the proposed operators in training two popular heterogeneous GNN models – RGCN and HGT. Our proposed approaches outperform the full-batch training time of RGCN by up to 3× and mini-batch by up to 2×.
Cooperative Minibatching in Graph Neural Networks
arXiv (Cornell University) · 2023-10-19
preprintOpen accessSenior authorTraining large scale Graph Neural Networks (GNNs) requires significant computational resources, and the process is highly data-intensive. One of the most effective ways to reduce resource requirements is minibatch training coupled with graph sampling. GNNs have the unique property that items in a minibatch have overlapping data. However, the commonly implemented Independent Minibatching approach assigns each Processing Element (PE, i.e., cores and/or GPUs) its own minibatch to process, leading to duplicated computations and input data access across PEs. This amplifies the Neighborhood Explosion Phenomenon (NEP), which is the main bottleneck limiting scaling. To reduce the effects of NEP in the multi-PE setting, we propose a new approach called Cooperative Minibatching. Our approach capitalizes on the fact that the size of the sampled subgraph is a concave function of the batch size, leading to significant reductions in the amount of work as batch sizes increase. Hence, it is favorable for processors equipped with a fast interconnect to work on a large minibatch together as a single larger processor, instead of working on separate smaller minibatches, even though global batch size is identical. We also show how to take advantage of the same phenomenon in serial execution by generating dependent consecutive minibatches. Our experimental evaluations show up to 4x bandwidth savings for fetching vertex embeddings, by simply increasing this dependency without harming model convergence. Combining our proposed approaches, we achieve up to 64% speedup over Independent Minibatching on single-node multi-GPU systems, using same resources.
SGORP: A Subgradient-based Method for d-Dimensional Rectilinear Partitioning
arXiv (Cornell University) · 2023-10-03
preprintOpen accessSenior authorPartitioning for load balancing is a crucial first step to parallelize any type of computation. In this work, we propose SGORP, a new spatial partitioning method based on Subgradient Optimization, to solve the $d$-dimensional Rectilinear Partitioning Problem (RPP). Our proposed method allows the use of customizable objective functions as well as some user-specific constraints, such as symmetric partitioning on selected dimensions. Extensive experimental evaluation using over 600 test matrices shows that our algorithm achieves favorable performance against the state-of-the-art RPP and Symmetric RPP algorithms. Additionally, we show the effectiveness of our algorithm to do application-specific load balancing using two applications as motivation: Triangle Counting and Sparse Matrix Multiplication (SpGEMM), where we model their load-balancing problems as $3$-dimensional RPPs.
Batch Dynamic Algorithm to Find k-Cores and Hierarchies
arXiv (Cornell University) · 2022-03-24
preprintOpen accessSenior authorFinding $k$-cores in graphs is a valuable and effective strategy for extracting dense regions of otherwise sparse graphs. We focus on the important problem of maintaining cores on rapidly changing dynamic graphs, where batches of edge changes need to be processed quickly. Prior batch core algorithms have only addressed half the problem of maintaining cores, the problem of maintaining a core decomposition. This finds vertices that are dense, but not regions; it misses connectivity. To address this, we bring an efficient index from community search into the core domain, the Shell Tree Index. We develop a novel dynamic batch algorithm to maintain it that improves efficiency over processing edge-by-edge. We implement our algorithm and experimentally show that with it core queries can be returned on rapidly changing graphs quickly enough for interactive applications. For 1 million edge batches, on many graphs we run over $100\times$ faster than processing edge-by-edge while remaining under re-computing from scratch.
PGAbB: A Block-Based Graph Processing Framework for Heterogeneous Platforms
arXiv (Cornell University) · 2022-09-09
preprintOpen accessSenior authorDesigning flexible graph kernels that can run well on various platforms is a crucial research problem due to the frequent usage of graphs for modeling data and recent architectural advances and variety. In this work, we propose a novel graph processing framework, PGAbB (Parallel Graph Algorithms by Blocks), for modern shared-memory heterogeneous platforms. Our framework implements a block-based programming model. This allows a user to express a graph algorithm using kernels that operate on subgraphs. PGAbB support graph computations that fit in host DRAM but not in GPU device memory, and provides simple but effective scheduling techniques to schedule computations to all available resources in a heterogeneous architecture. We have demonstrated that one can easily implement a diverse set of graph algorithms in our framework by developing five algorithms. Our experimental results show that PGAbB implementations achieve better or competitive performance compared to hand-optimized implementations. Based on our experiments on five graph algorithms and forty-four graphs, in the median, PGAbB achieves 1.6, 1.6, 5.7, 3.4, 4.5, and 2.4 times better performance than GAPBS, Galois, Ligra, LAGraph Galois-GPU, and Gunrock graph processing systems, respectively.
MG-GCN: A Scalable multi-GPU GCN Training Framework
2022-08-29 · 7 citations
preprintOpen accessSenior authorFull batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs containing tens of millions of vertices or more. Recent work has shown that, for the graphs used in the machine learning community, communication becomes a bottleneck, and scaling is blocked outside of the single machine regime. Thus, we propose MG-GCN, a multi-GPU GCN training framework taking advantage of the high-speed communication links between the GPUs present in multi-GPU systems. MG-GCN employs multiple High-Performance Computing optimizations, including efficient re-use of memory buffers to reduce the memory footprint of training GNN models, as well as communication and computation overlap. These optimizations enable execution on larger datasets, that generally do not fit into the memory of a single GPU in state-of-the-art implementations. Furthermore, they contribute to achieving superior speedup compared to the state-of-the-art. For example, MG-GCN achieves super-linear speedup with respect to DGL, on the Reddit graph on both DGX-1 (V100) and DGX-A100.
Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning
2022-09-01 · 10 citations
articleEarly but promising results in quantum computing have been enabled by the concurrent development of quan-tum algorithms, devices, and materials. Classical simulation of quantum programs has enabled the design and analysis of algorithms and implementation strategies targeting current and anticipated quantum device architectures. In this paper, we present a graph-based approach to achieving efficient quantum circuit simulation. Our approach involves partitioning the graph representation of a given quantum circuit into acyclic sub-graphs/circuits that exhibit better data locality. Simulation of each sub-circuit is organized hierarchically, with the iterative construction and simulation of smaller state vectors, improving overall performance. Also, this partitioning reduces the number of passes through data, improving the total computation time. We present three partitioning strategies and observe that acyclic graph partitioning typically results in the best time-to-solution. In contrast, other strategies reduce the partitioning time at the expense of potentially increased simulation times. Experimental evaluation demonstrates the effectiveness of our approach.
Frequent coauthors
- 747 shared
David Padua
- 166 shared
Michael Gerndt
- 152 shared
Davide Sangiorgi
University of Bologna
- 110 shared
Piotr Łuszczek
- 110 shared
Jack Dongarra
- 93 shared
Cevdet Aykanat
Bilkent University
- 89 shared
George Almási
IBM (United States)
- 89 shared
John A. Gunnels
Education
- 2000
PhD, Computer Engineering and Information Science
Bilkent Universitesi
Awards & honors
- Fellow of IEEE
- Fellow of SIAM
- NSF CAREER award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ümit V. Çatalyürek
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup