Farzad Farnoud

· Assistant Professor, Electrical and Computer Engineering Assistant Professor, Computer ScienceVerified

University of Virginia · Computer Science

Active 2007–2025

h-index19

Citations1.3k

Papers13034 last 5y

Funding$738k

Faculty page Lab page Website

See your match with Farzad Farnoud — sign in to PhdFit.Sign in

About

Farzad Farnoud is an Assistant Professor in the Electrical and Computer Engineering Department and the Computer Science Department at the University of Virginia. His research interests include the information-theoretic and probabilistic analysis of genomic evolutionary processes, rank aggregation and gene prioritization, and coding for flash memory and DNA storage. He has a background that includes a B.S. in Electrical Engineering from Sharif University of Technology, an M.S. in Electrical and Computer Engineering from the University of Toronto, and both an M.S. in Mathematics and a Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign. Farnoud previously served as a postdoctoral scholar at the California Institute of Technology. His work facilitates modeling, analysis, and storage of large datasets, especially biological data.

Research topics

Computer Science
Genetics
Discrete mathematics
Mathematics
Biology
Algorithm

Selected publications

Optimal Codes Correcting a Substring Edit
IEEE Transactions on Information Theory · 2025-04-21
articleSenior author
The substring edit error replaces a substring <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x with another string <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">v, where the lengths of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">v are bounded by a given constant <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k. It encompasses localized insertions, deletions, and substitutions within a window. Codes correcting one substring edit have redundancy at least log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n + <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k. In this paper, we construct codes correcting one substring edit with redundancy log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n + <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ok(log log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n), which is almost optimal. We also study the average-case document-exchange problem under one substring edit and construct a hash with an expected length of approximately 2 log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n + <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Ok(log log <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n) for any iid distribution for the documents.
Publisher DOI
Linear List Decodable Edit-Correcting Codes with Rate Approaching $1$
ArXiv.org · 2025-06-13
preprintOpen accessSenior author
Linear codes correcting one deletions have rate at most $1/2$. In this paper, we construct linear list decodable codes correcting edits with rate approaching $1$ and reasonable list size. Our encoder and decoder run in polynomial time.
Publisher OA PDF DOI
Constrained coding for error mitigation in nanopore-based DNA data storage
Scientific Reports · 2025-09-30 · 2 citations
articleOpen accessSenior authorCorresponding
DNA has been proposed as an alternative to magnetic and solid-state devices for storing digital data. In DNA data storage, writing data is performed through DNA synthesis, and reading is done via sequencing. Nanopore devices for sequencing DNA, like those produced by Oxford Nanopore Technologies, allow long reads and real-time sequencing but with lower accuracy compared to other sequencers, such as Illumina. To improve the reliability of data storage in DNA, we aim to combat the high error rate of nanopore sequencing using constrained coding. Certain aspects of the physical process underlying nanopore sequencing mean that some sequences are more prone to sequencing errors than others. We leverage this observation to design constrained codes using constrained de Bruijn graphs, along with a state-splitting encoder and a Viterbi-based decoder. We find that the overall performance of our novel coding system substantially improves upon the state-of-the-art DNN-based methods, reducing sequence-level errors by up to 6 times. We also visually demonstrate the performance of our approach through the simulated recovery of an image encoded and decoded using our method.
Publisher OA PDF DOI
Improving Smoothness in Huffman Coding: Canonical and Pre-allocated Variations
2025-09-29
articleSenior author
A data compression scheme is said to be smooth if it preserves similarity. That is, under small changes in its input, the output changes little. In this paper, we focus on the smoothness of Huffman coding and its variations through the concept of codeword smoothness, which is the number of codewords that change in the code as a result of a small change. We show that the standard Huffman algorithm is highly volatile, as a single change in source data can change a large fraction of the codewords, causing large changes in the compressed data. We propose the use of Canonical Huffman Encoding for improved smoothness and show that it can significantly outperform the standard version. We also propose a new variant, Pre-allocated Huffman Coding and show that it has better smoothness compared to the canonical algorithm. In addition to our theoretical results, we perform simulations, confirming our theoretical finding.
Publisher DOI
Asymptotically Optimal Codes Correcting One Substring Edit
ArXiv.org · 2025-07-18
preprintOpen accessSenior author
The substring edit error is the operation of replacing a substring $u$ of $x$ with another string $v$, where the lengths of $u$ and $v$ are bounded by a given constant $k$. It encompasses localized insertions, deletions, and substitutions within a window. Codes correcting one substring edit have redundancy at least $\log n+k$. In this paper, we construct codes correcting one substring edit with redundancy $\log n+O(\log \log n)$, which is asymptotically optimal.
Publisher OA PDF DOI
Correcting a Substring Edit Error of Bounded Length
IEEE Transactions on Communications · 2024-07-05 · 1 citations
articleSenior author
Localized errors, which occur in windows with bounded lengths, are common in a range of applications. Such errors can be modeled as k-substring edits, which replace one substring with another string, both with lengths upper bounded by k. This generalizes errors such as localized deletions or burst substitutions studied in the literature. In this paper, we show through statistical analysis of real data that substring edits better describe differences between related documents compared to independent edits, and thus commonly arise in problems related to data synchronization. We also show that for the dataset under study, assuming codes exist that can achieve the Gilbert-Varshamov (GV) bound, substring-edit-correcting codes can synchronize two documents with much lower overhead compared to general indel/substitution-correcting codes. Furthermore, given a constant k, we construct binary codes of length n for correcting a single k-substring edit that achieves the GV bound and subsequently has redundancy of asymptotically <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\log n$ </tex-math></inline-formula>, compared to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$4k\log n$ </tex-math></inline-formula>, the lowest redundancy achievable by an existing code for this problem. The time complexities of both encoding and decoding are polynomial with respect to n.
Publisher DOI
Asymptotically Optimal Codes Correcting One Substring Edit
2024-07-07 · 1 citations
articleSenior author
The substring edit error is the operation of replacing a substring <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{u}$</tex> of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{x}$</tex> with another string <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{v}$</tex>, where the lengths of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{u}$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{v}$</tex> are bounded by a given constant <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k$</tex>. It encompasses localized insertions, deletions, and substitutions within a window. Codes correcting one substring edit have redundancy at least <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\log n+k$</tex>. In this paper, we construct codes correcting one substring edit with redundancy <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\log n+O(\log\log n)$</tex>, which is asymptotically optimal. The full version of this paper is available online.1
Publisher DOI
Borda Regret Minimization for Generalized Linear Dueling Bandits
arXiv (Cornell University) · 2023-03-15
preprintOpen access
Dueling bandits are widely used to model preferential feedback prevalent in many applications such as recommendation systems and ranking. In this paper, we study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score while minimizing the cumulative regret. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. We first prove a regret lower bound of order $Ω(d^{2/3} T^{2/3})$ for the Borda regret minimization problem, where $d$ is the dimension of contextual vectors and $T$ is the time horizon. To attain this lower bound, we propose an explore-then-commit type algorithm for the stochastic setting, which has a nearly matching regret upper bound $\tilde{O}(d^{2/3} T^{2/3})$. We also propose an EXP3-type algorithm for the adversarial linear setting, where the underlying model parameter can change at each round. Our algorithm achieves an $\tilde{O}(d^{2/3} T^{2/3})$ regret, which is also optimal. Empirical evaluations on both synthetic data and a simulated real-world environment are conducted to corroborate our theoretical analysis.
Publisher OA PDF DOI
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
arXiv (Cornell University) · 2023-10-02
preprintOpen access
Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $\tilde O\big(d\sqrt{\sum_{t=1}^Tσ_t^2} + d\big)$, where $σ_t$ is the variance of the pairwise comparison in round $t$, $d$ is the dimension of the context vectors, and $T$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $\tilde O(d)$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms.
Publisher OA PDF DOI
Low-Redundancy Codes for Correcting Multiple Short-Duplication and Edit Errors
IEEE Transactions on Information Theory · 2023-01-02 · 8 citations
articleSenior author
Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising technology for satisfying future storage needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> edits, where a short duplication generates a copy of a substring with length <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\leq 3$ </tex-math></inline-formula> and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> edits (in addition to duplications) at the additional cost of roughly <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$8p(\log _{q} n) (1+o(1))$ </tex-math></inline-formula> symbols of redundancy, thus achieving the same asymptotic rate, where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$q\ge 4$ </tex-math></inline-formula> is the alphabet size and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula> is a constant with respect to the code length.
Publisher DOI

Recent grants

CRII: CIF: Model-based Compression of Biological Sequences
NSF · $175k · 2018–2022
CIF: Small: Collaborative Research: Rank Aggregation with Heterogeneous Information Sources: Efficient Algorithms and Fundamental Limits
NSF · $250k · 2019–2024
CIF: NSF-BSF: Small: Collaborative Research: Characterization and Mitigation of Noise in a Live DNA Storage Channel
NSF · $313k · 2018–2023

Frequent coauthors

Olgica Milenković
42 shared
Jehoshua Bruck
California Institute of Technology
39 shared
Moshe Schwartz
Ben-Gurion University of the Negev
27 shared
Yuanyuan Tang
University of Virginia
17 shared
Ryan Gabrys
University of California, San Diego
16 shared
Hao Lou
16 shared
Siddharth Jain
University of Petroleum and Energy Studies
15 shared
Shuche Wang
National University of Singapore
8 shared

Education

Masters, Math
University of Illinois at Urbana-Champaign
Masters, ECE
University of Toronto
PhD, ECE
University of Illinois at Urbana-Champaign

Awards & honors

Robert T. Chien Memorial Award from the University of Illino…
IEEE Data Storage Best Student Paper Award (2014)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Farzad Farnoud

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you