We consider error-correcting coding for DNA-based storage. We model the DNA storage channel as a multi-draw IDS channel where the input data is chunked into $M$ short DNA strands, which are copied a random number of times, and the channel outputs a random selection of $N$ noisy DNA strands. The retrieved DNA strands are prone to insertion, deletion, and substitution (IDS) errors. We propose an index-based concatenated coding scheme consisting of the concatenation of an outer code, an index code, and an inner synchronization code, where the latter two tackle IDS errors. We further propose a mismatched joint index-synchronization code maximum a posteriori probability decoder with optional clustering to infer symbolwise a posteriori probabilities for the outer decoder. We compute achievable information rates for the outer code and present Monte-Carlo simulations on experimental data.
翻译:我们考虑对基于DNA的存储进行错误校正编码。 我们将DNA存储通道建模为多拖式 IDS 频道, 输入数据被挤成短DNA条( 随机复制次数), 并随机选择 $N$ 噪音DNA 条。 检索到的DNA 条容易插入、 删除和替换( IDS) 错误 。 我们提出了一个基于索引的混合编码计划, 由外码、 索引代码 和内同步代码 组成, 后两个系统处理 IDS 错误 。 我们进一步提出一个不匹配的联合索引同步代码, 最多为事后概率解码, 并选用组合来推断外解码的外代号。 我们计算了外码的可实现信息率, 并在实验数据上进行蒙特卡洛 模拟 。