Consider two or more strings $\mathbf{x}^1,\mathbf{x}^2,\ldots,$ that are concatenated to form $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$. Suppose that up to $\delta$ deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in $\mathbf{x}$ in order to recover the boundaries of $\mathbf{x}^1,\mathbf{x}^2,\ldots$? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in $\delta$ among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.
翻译:考虑两个或多个串 $\mathbf{x}^1,\mathbf{x}^2,\ldots$,它们被连接起来形成 $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$。假设在每个连接的字符串中最多发生 $\delta$ 次删除操作。由于删除操作会改变字符串的长度,因此一个基本问题就是:我们需要在 $\mathbf{x}$ 中引入多少冗余信息才能恢复 $\mathbf{x}^1,\mathbf{x}^2,\ldots$ 的边界?这个边界问题等价于设计可检测每个连接字符串中确切删除操作数量的编码问题。在这项工作中,我们首先通过推导反证结果来得出删除检测代码的冗余下限。然后,我们提出一种基于标记的编码构造方案,其冗余在所有删除检测代码族中随着 $\delta$ 收敛于渐近最优,并且在所有分块可解码代码中达到完全最优。为了说明这种删除检测代码的实用性,我们将代码应用到轨迹重建中,设计了一个高效的编码重建方案,仅需要常数个轨迹。