Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most $p$ edits, where a short duplication generates a copy of a substring with length $\leq 3$ and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to $p$ edits (in addition to duplications) at the additional cost of roughly $8p(\log_q n)(1+o(1))$ symbols of redundancy, thus achieving the same asymptotic rate, where $q\ge 4$ is the alphabet size and $p$ is a constant. Furthermore, the time complexities of both the encoding and decoding processes are polynomial when $p$ is a constant with respect to the code length.
翻译:由于DNA数据密度、寿命、能源效率和生成副本的便利性较高,DNA被认为是满足未来需要的一种很有希望的储存技术。 但是,在数据存储和检索的不同阶段,DNA中可能会出现一系列不同的错误,包括删除、插入、重复和替代。 目前的纸张为同时纠正短( tandem)重复和最多最多为$p美元编辑而创建了错误更正代码, 短暂的复制生成了一个长度为$leq 3的子字符串副本, 并在原始子字符串之后插入副本, 编辑是一种替代、 删除或插入。 与最新的重复代码相比, 拟议的代码最多更正到$p$( 除了重复), 额外成本约为 8p( log_ q n) (1+o(1)) 美元冗余符号, 从而达到同样的时间速率, $q\ge 4是字母大小, $p$是固定的。 另外, 编码和解译进程的时间复杂度都与 $ 的常数。