
Farzad Farnoud
· Assistant Professor, Electrical and Computer Engineering Assistant Professor, Computer ScienceVerifiedUniversity of Virginia · Computer Science
Active 2007–2025
About
Farzad Farnoud is an Assistant Professor in the Electrical and Computer Engineering Department and the Computer Science Department at the University of Virginia. His research interests include the information-theoretic and probabilistic analysis of genomic evolutionary processes, rank aggregation and gene prioritization, and coding for flash memory and DNA storage. He has a background that includes a B.S. in Electrical Engineering from Sharif University of Technology, an M.S. in Electrical and Computer Engineering from the University of Toronto, and both an M.S. in Mathematics and a Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign. Farnoud previously served as a postdoctoral scholar at the California Institute of Technology. His work facilitates modeling, analysis, and storage of large datasets, especially biological data.
Research topics
- Computer Science
- Genetics
- Discrete mathematics
- Mathematics
- Biology
- Algorithm
Selected publications
Optimal Codes Correcting a Substring Edit
IEEE Transactions on Information Theory · 2025-04-21
articleSenior authorThe substring edit error replaces a substring <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</i> of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</i> with another string <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">v</i>, where the lengths of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">v</i> are bounded by a given constant <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i>. It encompasses localized insertions, deletions, and substitutions within a window. Codes correcting one substring edit have redundancy at least log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> + <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</i>. In this paper, we construct codes correcting one substring edit with redundancy log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> + <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O<sub>k</sub></i>(log log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i>), which is almost optimal. We also study the average-case document-exchange problem under one substring edit and construct a hash with an expected length of approximately 2 log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> + <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">O<sub>k</sub></i>(log log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i>) for any iid distribution for the documents.
Linear List Decodable Edit-Correcting Codes with Rate Approaching $1$
ArXiv.org · 2025-06-13
preprintOpen accessSenior authorLinear codes correcting one deletions have rate at most $1/2$. In this paper, we construct linear list decodable codes correcting edits with rate approaching $1$ and reasonable list size. Our encoder and decoder run in polynomial time.
Constrained coding for error mitigation in nanopore-based DNA data storage
Scientific Reports · 2025-09-30 · 2 citations
articleOpen accessSenior authorCorrespondingDNA has been proposed as an alternative to magnetic and solid-state devices for storing digital data. In DNA data storage, writing data is performed through DNA synthesis, and reading is done via sequencing. Nanopore devices for sequencing DNA, like those produced by Oxford Nanopore Technologies, allow long reads and real-time sequencing but with lower accuracy compared to other sequencers, such as Illumina. To improve the reliability of data storage in DNA, we aim to combat the high error rate of nanopore sequencing using constrained coding. Certain aspects of the physical process underlying nanopore sequencing mean that some sequences are more prone to sequencing errors than others. We leverage this observation to design constrained codes using constrained de Bruijn graphs, along with a state-splitting encoder and a Viterbi-based decoder. We find that the overall performance of our novel coding system substantially improves upon the state-of-the-art DNN-based methods, reducing sequence-level errors by up to 6 times. We also visually demonstrate the performance of our approach through the simulated recovery of an image encoded and decoded using our method.
Improving Smoothness in Huffman Coding: Canonical and Pre-allocated Variations
2025-09-29
articleSenior authorA data compression scheme is said to be smooth if it preserves similarity. That is, under small changes in its input, the output changes little. In this paper, we focus on the smoothness of Huffman coding and its variations through the concept of codeword smoothness, which is the number of codewords that change in the code as a result of a small change. We show that the standard Huffman algorithm is highly volatile, as a single change in source data can change a large fraction of the codewords, causing large changes in the compressed data. We propose the use of Canonical Huffman Encoding for improved smoothness and show that it can significantly outperform the standard version. We also propose a new variant, Pre-allocated Huffman Coding and show that it has better smoothness compared to the canonical algorithm. In addition to our theoretical results, we perform simulations, confirming our theoretical finding.
Asymptotically Optimal Codes Correcting One Substring Edit
ArXiv.org · 2025-07-18
preprintOpen accessSenior authorThe substring edit error is the operation of replacing a substring $u$ of $x$ with another string $v$, where the lengths of $u$ and $v$ are bounded by a given constant $k$. It encompasses localized insertions, deletions, and substitutions within a window. Codes correcting one substring edit have redundancy at least $\log n+k$. In this paper, we construct codes correcting one substring edit with redundancy $\log n+O(\log \log n)$, which is asymptotically optimal.
Correcting a Substring Edit Error of Bounded Length
IEEE Transactions on Communications · 2024-07-05 · 1 citations
articleSenior authorLocalized errors, which occur in windows with bounded lengths, are common in a range of applications. Such errors can be modeled as k-substring edits, which replace one substring with another string, both with lengths upper bounded by k. This generalizes errors such as localized deletions or burst substitutions studied in the literature. In this paper, we show through statistical analysis of real data that substring edits better describe differences between related documents compared to independent edits, and thus commonly arise in problems related to data synchronization. We also show that for the dataset under study, assuming codes exist that can achieve the Gilbert-Varshamov (GV) bound, substring-edit-correcting codes can synchronize two documents with much lower overhead compared to general indel/substitution-correcting codes. Furthermore, given a constant k, we construct binary codes of length n for correcting a single k-substring edit that achieves the GV bound and subsequently has redundancy of asymptotically <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\log n$ </tex-math></inline-formula>, compared to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4k\log n$ </tex-math></inline-formula>, the lowest redundancy achievable by an existing code for this problem. The time complexities of both encoding and decoding are polynomial with respect to n.
Asymptotically Optimal Codes Correcting One Substring Edit
2024-07-07 · 1 citations
articleSenior authorThe substring edit error is the operation of replacing a substring <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{u}$</tex> of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{x}$</tex> with another string <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{v}$</tex>, where the lengths of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{u}$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{v}$</tex> are bounded by a given constant <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k$</tex>. It encompasses localized insertions, deletions, and substitutions within a window. Codes correcting one substring edit have redundancy at least <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\log n+k$</tex>. In this paper, we construct codes correcting one substring edit with redundancy <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\log n+O(\log\log n)$</tex>, which is asymptotically optimal. The full version of this paper is available online.<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
Borda Regret Minimization for Generalized Linear Dueling Bandits
arXiv (Cornell University) · 2023-03-15
preprintOpen accessDueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. We first prove a regret lower bound of order $Ω(d^{2/3} T^{2/3})$ for the Borda regret minimization problem, where $d$ is the dimension of contextual vectors and $T$ is the time horizon. To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$. We also propose an EXP3-type algorithm for the adversarial linear setting, where the underlying model parameter can change at each round. Our algorithm achieves an $\tilde{O}(d^{2/3} T^{2/3})$ regret, which is also optimal. Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis.
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
arXiv (Cornell University) · 2023-10-02
preprintOpen accessDueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^Tσ_t^2} + d\big)$, where $σ_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.
Low-Redundancy Codes for Correcting Multiple Short-Duplication and Edit Errors
IEEE Transactions on Information Theory · 2023-01-02 · 8 citations
articleSenior authorDue to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising technology for satisfying future storage needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> edits, where a short duplication generates a copy of a substring with length <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\leq 3$ </tex-math></inline-formula> and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> edits (in addition to duplications) at the additional cost of roughly <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8p(\log _{q} n) (1+o(1))$ </tex-math></inline-formula> symbols of redundancy, thus achieving the same asymptotic rate, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$q\ge 4$ </tex-math></inline-formula> is the alphabet size and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> is a constant with respect to the code length.
Recent grants
CRII: CIF: Model-based Compression of Biological Sequences
NSF · $175k · 2018–2022
NSF · $250k · 2019–2024
NSF · $313k · 2018–2023
Frequent coauthors
- 42 shared
Olgica Milenković
- 39 shared
Jehoshua Bruck
California Institute of Technology
- 27 shared
Moshe Schwartz
Ben-Gurion University of the Negev
- 17 shared
Yuanyuan Tang
University of Virginia
- 16 shared
Ryan Gabrys
University of California, San Diego
- 16 shared
Hao Lou
- 15 shared
Siddharth Jain
University of Petroleum and Energy Studies
- 8 shared
Shuche Wang
National University of Singapore
Education
Masters, Math
University of Illinois at Urbana-Champaign
Masters, ECE
University of Toronto
PhD, ECE
University of Illinois at Urbana-Champaign
Awards & honors
- Robert T. Chien Memorial Award from the University of Illino…
- IEEE Data Storage Best Student Paper Award (2014)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Farzad Farnoud
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup