We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1..m]$ on a large repetitive text collection $T[1..n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m^2 \log^\epsilon n)$, for any constant $\epsilon > 0$, on a data structure of size $O(g_{rl})$. Further, on a locally consistent grammar of size $O(\delta\log\frac{n}{\delta})$, the time decreases to $O(m\log m(\log m + \log^\epsilon n))$. The value $\delta$ is a function of the substring complexity of $T$ and $\Omega(\delta\log\frac{n}{\delta})$ is a tight lower bound on the compressibility of repetitive texts $T$, so our structure has optimal size in terms of $n$ and $\delta$.
翻译:我们考虑在大型重复文本收藏中计算一个特定模式[1.m]$P[1.m]$[1.n]$[1.n]$的最大具体匹配(MEM)的问题,它代表着(希望大大小得多的)不长的无背景语法,其大小为$g ⁇ rl}美元。我们表明,对于任何恒定的美元(m%2\log ⁇ epsilon n),问题可以及时解决。对于任何恒定的美元($) > 0美元的数据结构而言,美元(g ⁇ r}$)。此外,对于本地一致的大小($(delta\log\g\frac{nüdelta})$的语法,时间可以减少到$(m\logm m +\ log ⁇ cipslon n)美元。 美元的价值是美元和美元(demega)的次质复杂性的函数。对于美元(delta\ max美元)的折合值结构来说,美元($)是紧的下限。