A tandem duplication denotes the process of inserting a copy of a segment of DNA adjacent to its original position. More formally, a tandem duplication can be thought of as an operation that converts a string $S = AXB$ into a string $T = AXXB.$ As they appear to be involved in genetic disorders, tandem duplications are widely studied in computational biology. Also, tandem duplication mechanisms have been recently studied in different contexts, from formal languages, to information theory, to error-correcting codes for DNA storage systems. The problem of determining the complexity of computing the tandem duplication distance between two given strings was proposed by [Leupold et al., 2004] and, very recently, it was shown to be NP-hard for the case of unbounded alphabets [Lafond et al., STACS2020]. In this paper, we significantly improve this result and show that the tandem duplication distance problem is NP-hard already for the case of strings over an alphabet of size $\leq 5.$ We also study some special classes of strings were it is possible to give linear time solutions to the existence problem: given strings $S$ and $T$ over the same alphabet, decide whether there exists a sequence of duplications converting $S$ into $T$. A polynomial time algorithm that solves the existence problem was only known for the case of the binary alphabet.
翻译:同步重复是指插入DNA部分与其原始位置相邻的复制部分的过程。 更正式地说, 同步重复可被视为将字符串 $S = AXB = AXXB 美元转换成字符串 $T = AXXB 美元的行动。 由于它们似乎涉及基因紊乱,因此在计算生物学中广泛研究同步重叠。 此外, 最近在不同的背景下,从正式语言到信息理论,对同步重复机制进行了研究,到DNA储存系统的纠正错误代码。 确定计算两个特定字符串之间同步重复距离的复杂性的问题,是由[Leupold 等人, 2004] 提出的,而且最近,对于无限制的字母[Lafond et al.,STACS/2020] 的情况,它被证明是硬硬的。 在本文中,我们大大改进了这一结果,并表明,从字符串到DNA储存系统大小为$5. 5 美元 的校正的校正问题已经很难解决了。 我们还研究某些特殊类型的字符串的问题, 是因为有可能给存在的时间问题提供直线性解决办法, 美元和美元的公式是否已经存在。
Alphabet is mostly a collection of companies. This newer Google is a bit slimmed down, with the companies that are pretty far afield of our main internet products contained in Alphabet instead.https://abc.xyz/