Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further by introducing redundancy in the write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an insertion-deletion-substitution (IDS) channel and design both encoding schemes and low-complexity decoding algorithms for coded trace reconstruction. We introduce Trellis BMA, a new reconstruction algorithm whose complexity is linear in the number of traces, and compare its performance to previous algorithms. Our results show that it reduces the error rate on both simulated and experimental data. The performance comparisons in this paper are based on a new dataset of traces that will be publicly released with the paper. Our hope is that this dataset will enable research progress by allowing objective comparisons between candidate algorithms.
翻译:作为DNA储存中读取过程的一部分,对DNA线进行脱氧分离,产生多种吵闹的复制件,可以合并,以产生对原始线的更好估计;这叫“痕量重建”;通过在写作序列中引入冗余,可以进一步降低误差率,这叫“编码痕量重建”。在本文中,我们将DNA储存通道建模为插入-删除替代频道(IDS),并设计编码系统和低兼容性解码算法,用于编码跟踪重建。我们引入了特雷利斯 BMA,这是一个新的重建算法,其复杂性在踪迹数量上线性,并将其性能与以前的算法进行比较。我们的结果显示,它降低了模拟数据和实验数据的误差率。本文的性能比较基于将公开发布的新的轨迹数据集。我们希望这一数据集将允许对候选算法进行客观的比较,从而使研究取得进展。