Given a set of $k$ strings $I$, their longest common subsequence (LCS) is the string with the maximum length that is a subset of all the strings in $I$. A data-structure for this problem preprocesses $I$ into a data-structure such that the LCS of a set of query strings $Q$ with the strings of $I$ can be computed faster. Since the problem is NP-hard for arbitrary $k$, we allow an error that allows some characters to be replaced by other characters. We define the approximation version of the problem with an extra input $m$, which is the length of the regular expression (regex) that describes the input, and the approximation factor is the logarithm of the number of possibilities in the regex returned by the algorithm, divided by the logarithm regex with the minimum number of possibilities. Then, we use a tree data-structure to achieve sublinear-time LCS queries. We also explain how the idea can be extended to the longest increasing subsequence (LIS) problem.
翻译:根据一套美元字符串的一套美元字符串,他们最长的共同子序列(LCS)是最大长度的字符串,这是所有字符串的一个子集,以美元计。这个问题的数据结构预处理美元进入一个数据结构,这样可以更快地计算出一组查询字符串的LCS$Q美元,而字符串为美元。由于问题在于任意的美元,因此我们允许一个错误,允许一些字符被其他字符取代。我们用一个额外的输入美元来定义问题的近似版本,即描述输入的正则表达式(regex)的长度,而近似系数是算法返回的regex中的可能性的对数,除以对数正数正数正数正数和最小的可能性。然后,我们使用树数据结构来达到亚线性时间LCS查询。我们还解释了如何将这一想法扩展至最长的子序列问题。