In this paper, we propose a novel iterative encoding algorithm for DNA storage to satisfy both the GC balance and run-length constraints using a greedy algorithm. DNA strands with run-length more than three and the GC balance ratio far from 50\% are known to be prone to errors. The proposed encoding algorithm stores data at high information density with high flexibility of run-length at most $m$ and GC balance between $0.5\pm\alpha$ for arbitrary $m$ and $\alpha$. More importantly, we propose a novel mapping method to reduce the average bit error compared to the randomly generated mapping method, using a greedy algorithm. The proposed algorithm is implemented through iterative encoding, consisting of three main steps: randomization, M-ary mapping, and verification. It has an information density of 1.8616 bits/nt in the case of $m=3$, which approaches the theoretical upper bound of 1.98 bits/nt, while satisfying two constraints. Also, the average bit error caused by the one nt error is 2.3455 bits, which is reduced by $20.5\%$, compared to the randomized mapping.
翻译:在本文中,我们提出一个新的DNA存储迭代编码算法,以使用贪婪算法满足GC平衡和运行期限的限制。 已知,运行长度超过3个的DNA链和GC平衡比率从50个到50个都容易出错。 拟议的编码算法将数据储存在高信息密度,运行长度高度灵活,最高为$1美元,最高为0.5美元,最高为0.5美元,最高为0.5美元,最高为0.6美元。 更重要的是,我们提出一种新的绘图方法,以使用贪婪算法,减少与随机生成的绘图方法相比的平均位误差。 提议的算法是通过迭代编码执行的,由三个主要步骤组成:随机化、M-ary绘图和验证。 其信息密度为1.8616位/吨,接近1.98位/吨的理论上限,同时满足两个限制。 此外,一个元错误造成的平均位误差为2.3455位,比重减少20.5美元,比随机绘图减少20.5元。