Olgica Milenkovic
· Professor, Electrical and Computer EngineeringVerifiedUniversity of Illinois Urbana-Champaign · Computer Science
Active 1996–2026
About
Olgica Milenkovic is a Professor in the Department of Electrical and Computer Engineering at the University of Illinois Urbana-Champaign. She holds a PhD in Electrical and Computer Engineering and an MSc in Mathematics from the University of Michigan, Ann Arbor. Her research focuses on developing new approaches for studying problems in bioinformatics and bioengineering using coding and information theory, including the design of DNA microarrays with error- and quality-control features and DNA microarrays utilizing compressed sensing principles. She investigates the connections between compressed sensing, superimposed coding, and non-linear compressive sensing with quantization and fault-tolerant algorithms, applying these methods to problems such as RNA folding, gene-regulatory network reverse engineering, and genome reversal distances. Additionally, her work involves constructing and analyzing codes on graphs, studying the combinatorial properties of low-density parity-check codes, and exploring the relationships between network coding, matroid theory, and algebraic coding theory. Milenkovic is also engaged in analyzing the average case complexity of algorithms in coding theory and computer algebra.
Research topics
- Computer Science
- Machine Learning
- Data Mining
- Mathematics
- Data science
- Human–computer interaction
- Biology
- Engineering
- Genetics
- Econometrics
- Computational biology
- Statistics
Selected publications
Spatially-Coupled Network RNA Velocities: A Control-Theoretic Perspective
arXiv (Cornell University) · 2026-01-03
preprintOpen accessSenior authorRNA velocity is an important model that combines cellular spliced and unspliced RNA counts to infer dynamical properties of various regulatory functions. Despite its wide applicability and many variants used in practice, the model has not been adequately designed to directly account for both intracellular gene regulatory network interactions and spatial intercellular communications. Here, we propose a new RNA velocity approach that jointly and directly captures two new network structures: an intracellular gene regulatory network (GRN) and an intercellular interaction network that captures interactions between (neighboring) cells, with relevance to spatial transcriptomics. We theoretically analyze this two-level network system through the lens of control and consensus theory. In particular, we investigate network equilibria, stability, cellular network consensus, and optimal control approaches for targeted drug intervention.
Correcting Contextual Deletions in DNA Nanopore Readouts
arXiv (Cornell University) · 2026-02-04
articleOpen accessThe problem of designing codes for deletion-correction and synchronization has received renewed interest due to applications in DNA-based data storage systems that use nanopore sequencers as readout platforms. In almost all instances, deletions are assumed to be imposed independently of each other and of the sequence context. These assumptions are not valid in practice, since nanopore errors tend to occur within specific contexts. We study contextual nanopore deletion-errors through the example setting of deterministic single deletions following (complete) runlengths of length at least $k$. The model critically depends on the runlength threshold $k$, and we examine two regimes for $k$: a) $k=C\log n$ for a constant $C\in(0,1)$; in this case, we study error-correcting codes that can protect from a constant number $t$ of contextual deletions, and show that the minimum redundancy (ignoring lower-order terms) is between $(1-C)t\log n$ and $2(1-C)t\log n$, meaning that it is a ($1-C$)-fraction of that of arbitrary $t$-deletion-correcting codes. To complement our non-constructive redundancy upper bound, we design efficiently and encodable and decodable codes for any constant $t$. In particular, for $t=1$ and $C>1/2$ we construct efficient codes with redundancy that essentially matches our non-constructive upper bound; b) $k$ equal a constant; in this case we consider the extremal problem where the number of deletions is not bounded and a deletion is imposed after every run of length at least $k$, which we call the extremal contextual deletion channel. This combinatorial setting arises naturally by considering a probabilistic channel that introduces contextual deletions after each run of length at least $k$ with probability $p$ and taking the limit $p\to 1$. We obtain sharp bounds on the maximum achievable rate under the extremal contextual deletion channel for arbitrary constant $k$.
Spatially-Coupled Network RNA Velocities: A Control-Theoretic Perspective
ArXiv.org · 2026-01-03
articleOpen accessSenior authorRNA velocity is an important model that combines cellular spliced and unspliced RNA counts to infer dynamical properties of various regulatory functions. Despite its wide applicability and many variants used in practice, the model has not been adequately designed to directly account for both intracellular gene regulatory network interactions and spatial intercellular communications. Here, we propose a new RNA velocity approach that jointly and directly captures two new network structures: an intracellular gene regulatory network (GRN) and an intercellular interaction network that captures interactions between (neighboring) cells, with relevance to spatial transcriptomics. We theoretically analyze this two-level network system through the lens of control and consensus theory. In particular, we investigate network equilibria, stability, cellular network consensus, and optimal control approaches for targeted drug intervention.
Correcting Contextual Deletions in DNA Nanopore Readouts
Open MIND · 2026-02-04
preprintThe problem of designing codes for deletion-correction and synchronization has received renewed interest due to applications in DNA-based data storage systems that use nanopore sequencers as readout platforms. In almost all instances, deletions are assumed to be imposed independently of each other and of the sequence context. These assumptions are not valid in practice, since nanopore errors tend to occur within specific contexts. We study contextual nanopore deletion-errors through the example setting of deterministic single deletions following (complete) runlengths of length at least $k$. The model critically depends on the runlength threshold $k$, and we examine two regimes for $k$: a) $k=C\log n$ for a constant $C\in(0,1)$; in this case, we study error-correcting codes that can protect from a constant number $t$ of contextual deletions, and show that the minimum redundancy (ignoring lower-order terms) is between $(1-C)t\log n$ and $2(1-C)t\log n$, meaning that it is a ($1-C$)-fraction of that of arbitrary $t$-deletion-correcting codes. To complement our non-constructive redundancy upper bound, we design efficiently and encodable and decodable codes for any constant $t$. In particular, for $t=1$ and $C>1/2$ we construct efficient codes with redundancy that essentially matches our non-constructive upper bound; b) $k$ equal a constant; in this case we consider the extremal problem where the number of deletions is not bounded and a deletion is imposed after every run of length at least $k$, which we call the extremal contextual deletion channel. This combinatorial setting arises naturally by considering a probabilistic channel that introduces contextual deletions after each run of length at least $k$ with probability $p$ and taking the limit $p\to 1$. We obtain sharp bounds on the maximum achievable rate under the extremal contextual deletion channel for arbitrary constant $k$.
Nature Structural & Molecular Biology · 2026-01-13 · 3 citations
articleFragmentation in Data Deduplication Systems II: The Jump Metric
2025-06-22
articleSenior authorData deduplication refers to a collection of data processing strategies that aim to remove repeated data chunks stored by different users. Despite providing excellent storage savings, deduplication can lead to severe file fragmentation issues: data chunks of the same file may be stored at distal locations on the server, making reconstruction time-consuming. Here, we continue our analytical study of uncoded and coded deduplication methods with reduced fragmentation levels. We model files as self-avoiding (simple) paths in specialized graphs whose nodes correspond to data chunks. To measure the level of fragmentation, we introduce the jump metric which captures the worst-case number of times during the reconstruction process of a file that one has to change the readout location on the server. We derive lower and upper bounds on the degree of jump fragmentation, and provide a new algorithm for computing the jump number of hierarchical data structures captured by trees. We also present examples that show how repetition and coded redundancy in chunk stores can reduce jump fragmentation.
Biophysical Journal · 2025-02-01
articleDo LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
ArXiv.org · 2025-06-06
preprintOpen accessMachine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.
Generalized Orthogonal De Bruijn Sequences
2025-06-22 · 1 citations
articleSenior authorA de Bruijn sequence of order <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k$</tex> over a finite alphabet is a cyclic sequence with the property that it contains every possible <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k$</tex>-sequence as a substring exactly once. Orthogonal de Bruijn sequences are collections of de Bruijn sequences of the same order, <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k$</tex>, satisfying the joint constraint that every (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k+1$</tex>) sequence appears as a substring in at most one of the sequences in the collection. Both de Bruijn and orthogonal de Bruijn sequences have found numerous applications in synthetic biology, although the latter topic remains largely unexplored in the coding theory literature. Here we study three relevant practical generalizations of orthogonal de Bruijn sequences where we relax either the constraint that every (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k+1$</tex>) -sequence appears exactly once, or that the sequences themselves are de Bruijn rather than balanced de Bruijn sequences. We also provide lower and upper bounds on the number of fixed-weight orthogonal de Bruijn sequences.
Reducing Fragmentation in Data Deduplication Systems via Partial Repetition and Coding
IEEE Transactions on Information Theory · 2025-09-23
articleSenior authorData deduplication, one of the key features of modern Big Data storage devices, is the process of removing replicas of data chunks stored by different users. Despite the importance of deduplication, several drawbacks of the method, such as storage robustness and file fragmentation, have not been previously analyzed from a theoretical point of view. Storage robustness pertains to ensuring that deduplicated data can be used to reconstruct the original files without service disruptions and data loss. Fragmentation pertains to the problems of placing deduplicated data chunks of different user files in a proximity-preserving linear order, since neighboring chunks of the same file may be stored in sectors far apart on the server. This work proposes a new theoretical model for data fragmentation and introduces novel graph- and coding-theoretic approaches for reducing fragmentation via limited duplication (repetition coding) and coded deduplication (e.g., linear coding). In addition to alleviating issues with fragmentation, limited duplication and coded deduplication can also serve the dual purpose of increasing the robusteness of the system design. The contributions of our work are three-fold. First, we describe a new model for file structures of the form of self-avoiding (simple) paths in specialized graphs. Second, we introduce several new metrics for measuring the fragmentation level in deduplication systems on graph-structured files, including the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">stretch metric</i> that captures the worst-case “spread” of adjacent data chunks within a file when deduplicated and placed on the server; and, the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">jump metric</i> that captures the worst-case number of times during the reconstruction process of a file that one has to change the readout location on the server. For the stretch metric, we establish a connection between the level of fragmentation and the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">bandwidth</i> of the file-graph. In particular, we derive lower and upper bounds on the degree of fragmentation and describe instances of the problem where repetition and coding reduce fragmentation. The key ideas behind our approach are graph folding and information-theoretic arguments coupled with graph algorithms such as matching. For the jump metric, we provide a new algorithm for computing the jump number of hierarchical data structures captured by trees. Third, we describe how controlled repetition and coded redundancy added after deduplication can ensure valuable trade-offs between the storage volume and the degree of fragmentation.
Recent grants
CIF: Small: Nonlinear Matrix and Tensor Completion with Applications in Systems Biology
NSF · $478k · 2011–2016
Genomic Compression: From Information Theory to Parallel Algorithms
NIH · $448k · 2015–2018
Collaborative Research: Constrained and Error-Control Coding for DNA Computers
NSF · $88k · 2005–2007
SemiSynBio: An On-Chip Nanoscale Storage System Using Chimeric DNA
NSF · $1.1M · 2018–2023
Genomic Compression: From Information Theory to Parallel Algorithms
NIH · $167k · 2015–2018
Frequent coauthors
- 75 shared
Ryan Gabrys
University of California, San Diego
- 42 shared
Farzad Farnoud
- 40 shared
Eli Chien
- 37 shared
Chao Pan
Institute of Plasma Physics
- 36 shared
Han Mao Kiah
- 32 shared
Gregory J. Puleo
- 29 shared
Son Hoang Dau
- 27 shared
Jianhao Peng
University of Illinois Urbana-Champaign
Education
- 1994
Ph.D., Computer Science
University of Belgrade
- 1991
M.S., Computer Science
University of Belgrade
- 1988
B.S., Computer Science
University of Belgrade
Awards & honors
- NVMW Persistent Impact Prize, 2022, Portable and Error-Free…
- ECE Alumni Distinguished Educator Award, University of Michi…
- Canadian Workshop on Information Theory, Halifax, Canada, 20…
- Learning and Signal Processing Methods for Emerging Multiomi…
- Keynote speaker, 2023 Biennial Symposium on Communications,…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Olgica Milenkovic
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup