This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the setup in which consecutive substrings are read with some given minimum overlap. First, upper bounds are provided on the attainable rates of codes that guarantee unique reconstruction. Then, we present efficient constructions of asymptotically optimal codes that meet the upper bound.
翻译:本文介绍了一套新的重建代码,其动机是应用DNA数据储存和排序。在这类应用中,DNA链通过读取其子字符串中的某个子字符串顺序排列。虽然以前的工作考虑了两种极端情况,即阅读了某种固定长度的子字符串,或者阅读了某种固定长度的子字符串,没有重叠,但本文考虑了连续的子字符串与某些给定的最小重叠的设置。首先,对保证独特重建的可实现的代码速率规定了上限。然后,我们提出了符合上限的不现的最佳代码的有效结构。