To increase the information capacity of DNA storage, composite DNA letters were introduced. We propose a novel channel model for composite DNA in which composite sequences are decomposed into ordered standard non-composite sequences. The model is designed to handle any alphabet size and composite resolution parameter. We study the problem of reconstructing composite sequences of arbitrary resolution over the binary alphabet under substitution errors. We define two families of error-correcting codes and provide lower and upper bounds on their cardinality. In addition, we analyze the case in which a single deletion error occurs in the channel and present a systematic code construction for this setting. Finally, we briefly discuss the channel's capacity, which remains an open problem.
翻译:为提升DNA存储的信息容量,复合DNA碱基被引入。本文提出一种新型的复合DNA信道模型,其中复合序列被分解为有序的标准非复合序列。该模型设计用于处理任意字母表规模和复合分辨率参数。我们研究了在二进制字母表下存在替换错误时,任意分辨率复合序列的重构问题。定义了两类纠错码族,并给出了其码本规模的上下界。此外,分析了信道发生单碱基缺失错误的情况,并提出了该场景下的系统化编码构造方法。最后,简要讨论了该信道的容量问题,这仍是一个开放性问题。