Rong Ge
· Cue Family Associate Professor of Computer ScienceVerifiedDuke University · Computer Science
Active 2004–2025
About
Rong Ge is the Cue Family Associate Professor in the Computer Science Department at Duke University. He earned his Ph.D. from the Computer Science Department of Princeton University under the supervision of Sanjeev Arora. Following his doctoral studies, he was a postdoctoral researcher at Microsoft Research in New England. His research broadly spans theoretical computer science and machine learning, with a focus on understanding and formalizing hidden structures in data and designing efficient algorithms to uncover them. He studies problems arising in the analysis of text, images, and other data types, employing techniques such as non-convex optimization and tensor decompositions. His work aims to provide provable algorithms for machine learning problems, contributing to the theoretical foundations of modern machine learning methods including deep learning.
Research topics
- Computer Science
- Parallel computing
- Artificial Intelligence
- Computer Security
- Embedded system
- Operating system
- Computer architecture
Selected publications
GALE: Leveraging Heterogeneous Systems for Efficient Unstructured Mesh Data Analysis
IEEE Transactions on Visualization and Computer Graphics · 2025-12-05
articleUnstructured meshes present challenges in scientific data analysis due to irregular distribution and complex connectivity. Computing and storing connectivity information is a major bottleneck for visualization algorithms, affecting both time and memory performance. Recent task-parallel data structures address this by precomputing connectivity information at runtime while the analysis algorithm executes, effectively hiding computation costs and improving performance. However, existing approaches are CPU-bound, forcing the data structure and analysis algorithm to compete for the same computational resources, limiting potential speedups. To overcome this limitation, we introduce a novel task-parallel approach optimized for heterogeneous CPU-GPU systems. Specifically, we offload the computation of mesh connectivity information to GPU threads, enabling CPU threads to focus on executing the visualization algorithm. Following this paradigm, we propose GPU-Aided Localized data structurE (GALE), the first open-source CUDA-based data structure designed for heterogeneous task parallelism. Experiments on two 20-core CPUs and an NVIDIA V100 GPU show that GALE achieves up to $2.7\times$ speedup over state-of-the-art localized data structures while maintaining memory efficiency.
Is In-Context Learning Feasible for HPC Performance Autotuning?
2025-06-03
articleWe examine whether in-context learning with Large Language Models (LLMs) can effectively address the challenges of High-Performance Computing (HPC) autotuning. LLMs have demonstrated remarkable natural language processing and artificial intelligence (AI) capabilities, sparking interest in their application across various domains, including HPC. Performance autotuning – the process of automatically optimizing system configurations to maximize efficiency through empirical evaluation – offers significant promise for enhancing application performance on larger systems and emerging architectures. However, this process remains computationally expensive due to the combinatorial explosion of configuration parameters and the complex, nonlinear relationships between configurations and performance outcomes.We pose a critical question: Can LLMs, without task-specific fine-tuning, accurately infer performance-configuration patterns by combining in-context examples with latent knowledge? To explore this, we leverage empirical performance data from real-world HPC systems, designing structured prompts and queries to evaluate LLMs’ capabilities. Our experiments reveal inherent limitations in applying in-context learning to performance autotuning, particularly for tasks requiring precise mathematical reasoning and analysis of complex multivariate dependencies. We provide empirical evidence of these shortcomings and discuss potential research directions to overcome these challenges.
International Journal of Education and Humanities · 2025-01-23 · 1 citations
articleOpen accessSenior authorAgainst the backdrop of the continuous advancement of globalization and educational modernization, the reform of the "Ideological and Political Education in Navigation English" teaching system has become an urgent task. This paper deeply analyzes the current situation of "Ideological and Political Education in Navigation English" teaching, and points out the existing problems such as the lack of ideological and political teaching content, the old and single teaching method, and the weak theoretical foundation of ideological and political education for teachers. In response to these problems, the design of the teaching system adheres to the principles of fostering virtue through education, integrating knowledge and moral education, being practice-oriented and continuously improving, and comprehensively covers key links such as needs analysis and goal setting, teaching content planning, selection and implementation of teaching methods, construction of evaluation system, construction of teaching staff and guarantee of teaching resources. Through practical tests, this teaching system has significantly improved students' professional knowledge level and language skills, and has achieved remarkable results in the cultivation of ideological and political quality and professional competence, continuously supply high - quality talents to the field of navigation and effectively promoted the innovation and change of the Navigation English course.
AskHPC: A ChatBot for High Performance Computing User Support
2025-11-07 · 1 citations
articleOpen accessSenior authorHigh-Performance Computing (HPC) systems in the exascale era are increasingly heterogeneous, requiring users to navigate diverse tools, configurations, and best practices. However, essential information is often scattered across fragmented, multimodal documentation, making it difficult and time-consuming to locate. To address this, we present AskHPC, an intelligent question-answering ChatBot that delivers accurate, timely, and accessible information through a unified conversational interface. Built on a curated knowledge base integrating user guides, scheduler manuals, and programming documentation, AskHPC leverages Large Language Models (LLMs) within a Retrieval-Augmented Generation (RAG) framework. It employs two key techniques to improve HPC query responses: a modality-aware document parsing pipeline that preserves multimodal structure, and a dual-context strategy combining retrieved content (e.g., complete code blocks) with LLM-generated semantics. Evaluation, including a real-world user study, shows AskHPC outperforms direct LLM queries and vanilla RAG systems, enhancing user support and accelerating HPC software development.
2025-11-12 · 2 citations
articleOpen accessSenior authorUnified Memory (UM) technologies simplify memory management across CPU and GPU domains in GPU-accelerated heterogeneous architectures through transparent data migration. However, the default migration mechanism can severely degrade performance when applications oversubscribe GPU memory. Existing approaches to mitigating this performance degradation often fail to generalize, as they target specific application types, require specialized hardware, or integrate opaque classification methods.
SSRN Electronic Journal · 2025-01-01 · 1 citations
preprintOpen accessMethod for Recognition of Communication Interference Signals under Small-Sample Conditions
Applied Sciences · 2024-07-04 · 1 citations
articleOpen access1st authorTo address the difficulty in obtaining a large number of labeled jamming signals in complex electromagnetic environments, this paper proposes a small-sample communication jamming signal recognition method based on WDCGAN-SA (Wasserstein Deep Convolution Generative Adversarial Network–Self Attention) and C-ResNet (Convolution Block Attention Module–Residual Network). Firstly, leveraging the DCGAN architecture, we integrate the Wasserstein distance measurement and gradient penalty mechanism to design the jamming signal generation model WDCGAN for data augmentation. Secondly, we introduce a self-attention mechanism to make the generation model focus on global correlation features in time–frequency maps while optimizing training strategies to enhance the quality of generated samples. Finally, real samples are mixed with generated samples and fed into the classification network, incorporating cross-channel and spatial information in the classification network to improve jamming signal recognition rates. The simulation results demonstrate that under small-sample conditions with a Jamming-to-Noise Ratio (JNR) ranging from −10 dB to 10 dB, the proposed algorithm significantly outperforms GAN, WGAN and DCGAN comparative algorithms in recognizing six types of communication jamming signals.
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
arXiv (Cornell University) · 2024-02-10
preprintOpen accessSenior authorData augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of such methods. We first prove that linear models on binary classification data trained with label augmentation learn only the minimum variance features in the data, while standard training (which includes weight decay) can learn higher variance features. We then use our techniques to show that even for nonlinear models and general data distributions, the label smoothing and Mixup losses are lower bounded by a function of the model output variance. Lastly, we demonstrate empirically that this aspect of label smoothing and Mixup can be a positive and a negative. On the one hand, we show that the strong performance of label smoothing and Mixup on image classification benchmarks is correlated with learning low variance hidden representations. On the other hand, we show that Mixup and label smoothing can be more susceptible to low variance spurious correlations in the training data.
Vendor-neutral and Production-grade Job Power Management in High Performance Computing
2024-11-17 · 3 citations
articleSenior authorPower management and energy efficiency are critical research areas for exascale computing and beyond, necessitating reliable telemetry and control for distributed systems. Despite this need, existing approaches present several limitations precluding their adoption in production. These limitations include, but are not limited to, lack of portability due to vendor-specific and closed-source solutions, lack of support for non-MPI applications, and lack of user-level customization.We present a job-level power management framework based on Flux. We introduce flux-power-monitor and demonstrate its effectiveness on the Lassen (IBM Power AC922) and Tioga (HPE Cray EX235A) systems with a low average overhead of 0.4%. We also present flux-power-manager, where we discuss a proportional sharing policy and introduce a hierarchical FFT-based dynamic power management algorithm (FPP). We demonstrate that FPP reduces energy by 1% compared to proportional sharing, and by 20% compared to the default IBM static power capping policy.
ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods
arXiv (Cornell University) · 2024-06-23
preprintOpen accessThe rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the data used in their pretraining. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraining data by leveraging their conditional language modeling capabilities. ReCaLL examines the relative change in conditional log-likelihoods when prefixing target data points with non-member context. Our empirical findings show that conditioning member data on non-member prefixes induces a larger decrease in log-likelihood compared to non-member data. We conduct comprehensive experiments and show that ReCaLL achieves state-of-the-art performance on the WikiMIA dataset, even with random and synthetic prefixes, and can be further improved using an ensemble approach. Moreover, we conduct an in-depth analysis of LLMs' behavior with different membership contexts, providing insights into how LLMs leverage membership information for effective inference at both the sequence and token level.
Recent grants
NSF · $252k · 2015–2018
NSF · $54k · 2011–2013
NSF · $508k · 2017–2022
NSF · $286k · 2019–2024
NSF · $400k · 2019–2024
Frequent coauthors
- 35 shared
Sham M. Kakade
- 26 shared
Sanjeev Arora
- 26 shared
Xizhou Feng
Menlo School
- 19 shared
Kirk W. Cameron
Virginia Tech
- 18 shared
Ziliang Zong
Texas State University
- 14 shared
Anima Anandkumar
California Institute of Technology
- 13 shared
Zizhong Chen
- 12 shared
Majid Janzamin
Twitter (United States)
Labs
The Rong Ge Lab focuses on theoretical computer science and machine learning, particularly in analyzing text, images, and other forms of data using techniques such as non-convex optimization and tensor decompositions.
Awards & honors
- Collaborative Reseach: Transferable, Hierarchical, Expressiv…
- NSF CAREER: Optimization Landscape for Non-convex Functions…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Rong Ge
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup