Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. We first present an $O(n^6)$ time algorithm to compute the longest cubic subsequences of all the $O(n^2)$ substrings of $S$, improving the trivial $O(n^7)$ bound. Then, an $O(n^6)$ time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of $S$ is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when $\Sigma$ is unbounded, each letter appears in $S$ at most $d$ times and all the letters in $\Sigma$ must appear in the solution. We show that the problem is NP-hard for $d=4$, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in $S$ at most $d=3$ times, then the problem is solvable in $O(n^5)$ time.
翻译:背景:受计算序列中重复模式的启发,提出了一个名为最长子序列-重复子序列(LSRS)的全新基础问题。给定一个长度为n的序列S,一个字符重复子序列是指形如$x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$的S的子序列,其中$x_i$是S的一个子序列,$x_j\neq x_{j+1}$,$d_i\geq 2$,$i\in [k]$,$j\in [k-1]$。本文首先提出一个$O(n^6)$时间算法,以改善最差时间复杂度$O(n^7)$,用于计算S的所有$O(n^2)$子串的最长立方子序列。然后,得到了一个计算S的最长子序列-重复子序列(LSRS)的$O(n^6)$时间算法。最后,我们着重研究了两个变种的问题。首先是受限版本,即当$\Sigma$无界时,每个字母在S中最多出现$d$次,并且$\Sigma$中的所有字母都必须出现在解决方案中。我们通过从特殊的SAT(从3-COLORING获得)的规约中证明了当$d=4$时,问题是NP困难的。然后,我们证明了当每个字母在S中最多出现$d=3$次时,问题可以在$O(n^5)$时间内解决。