The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings $x$ and $y$ of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible $1 \le i < j \le |x| = |y|$, $x[i..j]$ is a palindrome if and only if $y[i..j]$ is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text $T$ of length $n$ over an alphabet of size $\sigma$, an index for pal-matching is to support, given a pattern $P$ of length $m$, the counting queries that compute the number $\mathsf{occ}$ of occurrences of $P$ and the locating queries that compute the occurrences of $P$. The authors in~[I et al., Theor. Comput. Sci., 2013] proposed an $O(n \lg n)$-bit data structure to support the counting queries in $O(m \lg \sigma)$ time and the locating queries in $O(m \lg \sigma + \mathsf{occ})$ time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies $2n \lg \min(\sigma, \lg n) + 2n + o(n)$ bits of space and supports the counting queries in $O(m)$ time. The PalFM-indexes can support the locating queries in $O(m + \Delta \mathsf{occ})$ time by adding $\frac{n}{\Delta} \lg n + n + o(n)$ bits of space, where $\Delta$ is a parameter chosen from $\{1, 2, \dots, n\}$ in the preprocessing phase.
翻译:回文模式匹配(pal-matching)是一种广义的模式匹配,其中如果两个长度相同的字符串$x$和$y$具有相同的回文结构,即对于任何可能的$1 \le i < j \le |x| = |y|$,$x[i..j]$是回文的当且仅当$y[i..j]$是回文的,则视之为匹配(pal-match)。回文匹配问题是在文本中搜索与模式pal-match的子字符串的问题。给定长度为$n$的文本$T$,字母表大小为$\sigma$,支持统计查询以计算出某个模式$P$在$T$中出现$\mathsf{occ}$次的数量,以及支持查询定位以计算$P$的出现情况。[I et al., Theor. Comput. Sci., 2013]的作者提出了一种$O(n \lg n)$比特的数据结构以支持$O(m \lg \sigma)$时间内统计查询和$O(m \lg \sigma + \mathsf{occ})$时间内查询定位。在本文中,我们提出了一种FM索引类型的索引,名为PalFM-index,它占用$2n \lg \min(\sigma, \lg n) + 2n + o(n)$比特的空间并以$O(m)$时间内支持统计查询。通过在预处理阶段添加$\frac{n}{\Delta} \lg n + n + o(n)$比特的空间,可以通过PalFM-indexes以$O(m + \Delta \mathsf{occ})$时间支持查询定位,其中$\Delta$是从$\{1, 2, \dots, n\}$中选择的一个参数。