The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact occurrence of a given pattern $P$. However, practical applications motivate the necessity of considering more complex queries, for example concerning near occurrences of two patterns. Recently, Bille et al. [CPM 2021] introduced a variant of such queries, called gapped consecutive occurrences, in which a query consists of two patterns $P_{1}$ and $P_{2}$ and a range $[a,b]$, and one must find all consecutive occurrences $(q_1,q_2)$ of $P_{1}$ and $P_{2}$ such that $q_2-q_1 \in [a,b]$. By their results, we cannot hope for a very efficient indexing structure for such queries, even if $a=0$ is fixed (although at the same time they provided a non-trivial upper bound). Motivated by this, we focus on a text given as a straight-line program (SLP) and design an index taking space polynomial in the size of the grammar that answers such queries in time optimal up to polylog factors.
翻译:摘要:算法在字符串中的基本问题是索引,即针对特定查询预处理给定的字符串。到目前为止,当查询要求给定模式 $P$ 的确切出现时,我们已经有了一些有效的解决方案。然而,实际应用中,考虑更复杂的查询是必要的,例如涉及两种模式的近似出现情况。最近,Bille等人 [CPM 2021] 引入了这种查询的变体,称为间隙连续出现,其中查询由两个模式 $P_{1}$ 和 $P_{2}$ 和一个范围 $[a,b]$ 组成,必须找到所有满足 $q_2-q_1 \in [a,b]$ 的 $P_{1}$ 和 $P_{2}$ 的连续出现 $(q_1,q_2)$。根据他们的结果,即使 $a=0$ 固定不变,我们也不能指望对这种查询非常有效的索引结构(尽管同时他们提供了一个非平凡的上界)。受此启发,我们专注于以直线程序(SLP)形式给出的文本,并设计了一个索引结构,其空间复杂度为语法大小的多项式,以最优的时间复杂度回答这种查询,直到多项式对数因子。