Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. We first present an $O(n^6)$ time algorithm to compute the longest cubic subsequences of all the $O(n^2)$ substrings of $S$, improving the trivial $O(n^7)$ bound. Then, an $O(n^6)$ time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of $S$ is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when $\Sigma$ is unbounded, each letter appears in $S$ at most $d$ times and all the letters in $\Sigma$ must appear in the solution. We show that the problem is NP-hard for $d=4$, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in $S$ at most $d=3$ times, then the problem is solvable in $O(n^5)$ time.
翻译:源于计算序列中的重复模式,提出了一个新的基本问题,称为最长子序列- 重复子序列(LSRS)。给定长度为n的序列S,一个字母重复的子序列是一个形式为$x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$的S的子序列,其中 $x_i$ 是S的一个子序列,$x_j \ne x_{j+1}$,$i \in [k]$ 和 $j \in [k-1]$, $d_i \ge 2$。首先提出了一种 $O(n^6)$ 时间算法,用于计算S的所有$O(n^2)$ 子串的最长立方子序列,改善了显而易见的 $O(n^7)$上界。然后获得一个 $O(n^6)$ 时间算法来计算S的最长子序列- 重复子序列(LSRS)。最后,我们专注于这个问题的两个变体。首先考虑约束版本,当 $ \Sigma $ 无界时,每个字母在S中最多出现 $d$ 次,并且所有字母都必须出现在解决方案中。通过从3-COLORING获得的特殊版本的SAT减少,我们表明 $d=4$ 时问题是NP难的。然后证明,当每个字母在S中最多出现 $d=3$ 次时,该问题可在 $O(n^5)$ 的时间内解决。