The deletion distance between two binary words $u,v \in \{0,1\}^n$ is the smallest $k$ such that $u$ and $v$ share a common subsequence of length $n-k$. A set $C$ of binary words of length $n$ is called a $k$-deletion code if every pair of distinct words in $C$ has deletion distance greater than $k$. In 1965, Levenshtein initiated the study of deletion codes by showing that, for $k\ge 1$ fixed and $n$ going to infinity, a $k$-deletion code $C\subseteq \{0,1\}^n$ of maximum size satisfies $\Omega_k(2^n/n^{2k}) \leq |C| \leq O_k( 2^n/n^k)$. We make the first asymptotic improvement to these bounds by showing that there exist $k$-deletion codes with size at least $\Omega_k(2^n \log n/n^{2k})$. Our proof is inspired by Jiang and Vardy's improvement to the classical Gilbert--Varshamov bounds. We also establish several related results on the number of longest common subsequences and shortest common supersequences of a pair of words with given length and deletion distance.
翻译:在1965年,Levenshtein开始研究删除代码,方法是显示,对于1美元固定值和1美元固定值和1美元固定值和美元固定值,美元和美元共享一个共同的次序列 $-k美元。如果每对单词在$C$中删除的距离大于美元。如果每对单词在$C$中删除的距离大于$k美元,那么,美元双倍删除的距离就被称为美元删除代码。在1965年,Levenshtein开始研究删除代码,方法是显示,对于1美元固定值和美元固定值,美元固定值和美元将最小的次序列代码 $C\subseteq $0,1美元美元。 美元最大大小的一元双倍的双倍长度, 美元=Omega_k( 2 ⁇ n/ n2k} 美元双倍的双倍双倍的双倍的双字母。