In the emerging field of DNA storage, data is encoded as DNA sequences and stored. The data is read out again by sequencing the stored DNA. Nanopore sequencing is a new sequencing technology that has many advantages over other methods; in particular, it is cheap, portable, and can support longer reads. While several practical coding schemes have been developed for DNA storage with nanopore sequencing, the theory is not well understood. Towards that end, we study a highly abstracted (deterministic) version of the nanopore sequencer, which highlights key features that make its analysis difficult. We develop methods and theory to understand the capacity of our abstracted model, and we propose efficient coding schemes and algorithms.
翻译:在新兴的DNA储存领域,数据被编码为DNA序列并储存起来。数据通过对储存的DNA进行排序再次读出。Nanopore测序是一种新测序技术,它比其他方法具有许多优势;特别是,它便宜、便携且可以支持更长的读数。虽然已经为DNA储存开发了几种以纳米质序列进行DNA储存的实用编码计划,但这一理论并未得到很好地理解。为此,我们研究了纳米质谱序列仪的高度抽象(确定性)版本,该版本凸显了难以进行分析的关键特征。我们开发了方法和理论来理解我们抽象模型的能力,我们提出了高效的编码计划和算法。