Computing the {\em matching statistics} of a string $P[1..m]$ with respect to a text $T[1..n]$ is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching statistics upon highly repetitive texts. We design three different data structures that are similar to LZ-compressed indexes. The space costs of all of them can be measured by $\gamma$, the size of the smallest string attractor [STOC'2018] and $\delta$, a better measure of repetitiveness [LATIN'2020].
翻译:计算字符串$P[1.m]$[1.m]$的字符串匹配统计}对于文本$T[1.n]$是一个根本问题,它适用于基因组序列比较。在本文件中,我们研究了在高度重复的文本中计算匹配统计数据的问题。我们设计了三种与LZ压缩指数相似的不同数据结构。所有数据的空间成本都可以用$\gamma$(最小字符串吸引器的大小[STOC'2018]和$\delta$(一个更好的重复性衡量标准[LATIN'2020])来衡量。