The problem of string reconstruction based on its substrings spectrum has received significant attention recently due to its applicability to DNA data storage and sequencing. In contrast to previous works, we consider in this paper a setup of this problem where multiple strings are reconstructed together. Given a multiset $S$ of strings, all their substrings of some fixed length $\ell$, defined as the $\ell$-profile of $S$, are received and the goal is to reconstruct all strings in $S$. A multi-strand $\ell$-reconstruction code is a set of multisets such that every element $S$ can be reconstructed from its $\ell$-profile. Given the number of strings~$k$ and their length~$n$, we first find a lower bound on the value of $\ell$ necessary for existence of multi-strand $\ell$-reconstruction codes with non-vanishing asymptotic rate. We then present two constructions of such codes and show that their rates approach~$1$ for values of $\ell$ that asymptotically behave like the lower bound.
翻译:基于其子字符串频谱的字符串重建问题最近由于适用于DNA数据存储和排序而引起极大关注。与以前的工作不同,我们在本文件中认为这个问题的设置是多字符串一起重建的。考虑到多个字符串的多套元S$,所有其固定长度为$S$的子字符串都收到,目标是以美元重整所有字符串。多字符串和美元重建代码是一套多套套套件,每套元S$都可以从其$\ell$的特征中重建。考虑到字符串~k$及其长度~n美元,我们首先发现一个较低的约束值,即多字符串和美元重建代码存在所需的美元/ell$美元价值,其价值以美元计为美元。然后我们提出两种这种代码的构建,并表明其利率为$1美元/ell$的汇率,其表现方式类似低调。