This paper studies \emph{linear} and \emph{affine} error-correcting codes for correcting synchronization errors such as insertions and deletions. We call such codes linear/affine insdel codes. Linear codes that can correct even a single deletion are limited to have information rate at most $1/2$ (achieved by the trivial 2-fold repetition code). Previously, it was (erroneously) reported that more generally no non-trivial linear codes correcting $k$ deletions exist, i.e., that the $(k+1)$-fold repetition codes and its rate of $1/(k+1)$ are basically optimal for any $k$. We disprove this and show the existence of binary linear codes of length $n$ and rate just below $1/2$ capable of correcting $\Omega(n)$ insertions and deletions. This identifies rate $1/2$ as a sharp threshold for recovery from deletions for linear codes, and reopens the quest for a better understanding of the capabilities of linear codes for correcting insertions/deletions. We prove novel outer bounds and existential inner bounds for the rate vs. (edit) distance trade-off of linear insdel codes. We complement our existential results with an efficient synchronization-string-based transformation that converts any asymptotically-good linear code for Hamming errors into an asymptotically-good linear code for insdel errors. Lastly, we show that the $\frac{1}{2}$-rate limitation does not hold for affine codes by giving an explicit affine code of rate $1-\epsilon$ which can efficiently correct a constant fraction of insdel errors.
翻译:本文研究 \ emph{ linear} 和\ emph{ ffine} 错误校正代码, 以纠正插入和删除等同步错误。 我们称这种代码为直线/ offine insdel code。 即使是一次性删除的线性代码也限制在最多1/2美元的信息率( 由2倍重复代码实现 ) 。 此前, 它( 错误) 报告说, 更普遍地没有非三角线性代码可以纠正删除美元, 也就是说, $( k+1) 的折叠重复代码及其1/ (k+1) 美元的比率基本上对任何美元来说是最佳的。 我们分解了这个代码, 并展示了双线性线性线性代码的存在率 。 我们的直线性代码在直线性交易中, 以直线性规则为直线性价比。