Synthetic polymer-based storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits $\{0,1\}$, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. Reading out the stored data requires the use of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the \emph{composition}, i.e. the number of $1$s and $0$s in the concerned substring can be inferred. Prior works have dealt with the problem of unique string reconstruction from the set of all possible compositions, called \emph{composition multiset}. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work builds on this research by generalizing previously considered error models, mainly confined to substitution of compositions. To this end, we define new error models that consider insertions of spurious compositions and deletions of existing ones, thereby corrupting the composition multiset. We analyze if the reconstruction codebook proposed by Pattabiraman \emph{et al.} is indeed robust to such errors, and if not, propose new coding constraints to remedy this.
翻译:以合成聚合物为基础的存储似乎是一个特别有希望的候选人,它可以帮助应对对档案存储需求的不断增加的需求。它涉及设计不同质量的分子,以代表相应的位数 $0,1美元,然后合成一个反映信息字符串中位数顺序的分子单位聚合体。读取存储的数据需要使用同步质量光谱仪,将聚合物分解成较短的子质谱仪,并提供相应的质量,从中可以推断出在有关子字符串中出现1美元和0美元的错误。以前的工作处理过从所有可能组成组组组组组成的独特字符串重建的问题,称为\emph{方位多位。 完成这项工作的方法要么是确定哪些线条长度总是允许进行独特重建,要么制定编码限制,以便利所有字符串长度的相同。此外,为了处理由于阅读过程中不精确的破碎造成的替代错误,也可以推断出有关分解过程中的1美元和0.0美元的错误数。 之前的工作涉及从所有可能的组合组别组别组别组别中的独特的组别重建问题,因此,我们先考虑现有的结构, 将新的组分解为新的组别定义。