DNA Data storage has recently attracted much attention due to its durable preservation and extremely high information density (bits per gram) properties. In this work, we propose a hybrid coding strategy comprising of generalized constrained codes to tackle homopolymer (run-length) limit and a protograph based low-density parity-check (LDPC) code to correct asymmetric nucleotide level (i.e., A/T/C/G) substitution errors that may occur in the process of DNA sequencing. Two sequencing techniques namely, Nanopore sequencer and Illumina sequencer with their equivalent channel models and capacities are analyzed. A coding architecture is proposed to potentially eliminate the catastrophic errors caused by the error-propagation in the constrained decoding while enabling high coding potential. We also show the log likelihood ratio (LLR) calculation method for the belief propagation decoding with this coding architecture. The simulation results and the theoretical analysis show that the proposed coding scheme exhibits good bit-error rate (BER) performance and high coding potential ($\sim1.98$ bits per nucleotide).
翻译:DNA数据储存最近因其耐久保存和极高信息密度(每克比特)特性而引起许多注意。在这项工作中,我们提议了一项混合编码战略,其中包括通用限值编码,以解决同族聚合物(长)限值和基于编程的低密度对等检查(LDPC)编码,以纠正在DNA排序过程中可能出现的非对称核分裂物替代错误(即A/T/C/G),分析了两种排序技术,即Nanopore测序器和光素测序器及其等效频道模型和能力。我们提议了一个编码结构,以潜在地消除因限制解码过程中的错误调整造成的灾难性错误,同时能够发挥高编码潜力。我们还展示了用于信仰传播与该编码结构脱码的日志概率比(LLLR)计算法。模拟结果和理论分析表明,拟议的编程方案显示了良好的位器率(BER)性能和高编码潜力($sim1.98美元/nucleotide)。