在字符串和字符串上匹配笛卡尔- 树的方位偏移 (Position Heaps for Cartesian-tree Matching on Strings and Tries)

The Cartesian-tree pattern matching is a recently introduced scheme of pattern matching that detects fragments in a sequential data stream which have a similar structure as a query pattern. Formally, Cartesian-tree pattern matching seeks all substrings $S'$ of the text string $S$ such that the Cartesian tree of $S'$ and that of a query pattern $P$ coincide. In this paper, we present a new indexing structure for this problem called the Cartesian-tree Position Heap (CPH). Let $n$ be the length of the input text string $S$, $m$ the length of a query pattern $P$, and $\sigma$ the alphabet size. We show that the CPH of $S$, denoted $\mathsf{CPH}(S)$, supports pattern matching queries in $O(m (\sigma + \log (\min\{h, m\})) + occ)$ time with $O(n)$ space, where $h$ is the height of the CPH and $occ$ is the number of pattern occurrences. We show how to build $\mathsf{CPH}(S)$ in $O(n \log \sigma)$ time with $O(n)$ working space. Further, we extend the problem to the case where the text is a labeled tree (i.e. a trie). Given a trie $T$ with $N$ nodes, we show that the CPH of $T$, denoted $\mathsf{CPH}(T)$, supports pattern matching queries on the trie in $O(m (\sigma^2 + \log (\min\{h, m\})) + occ)$ time with $O(N \sigma)$ space. We also show a construction algorithm for $\mathsf{CPH}(T)$ running in $O(N \sigma)$ time and $O(N \sigma)$ working space.

翻译：Cartesian- tree 模式匹配是最近推出的一种模式匹配方案, 用来检测在相继数据流中具有类似查询模式结构的碎片。形式上, Cartesian- tree 模式匹配寻找文本字符串中的所有子字符串$S$$S美元, 这样Cartesian 树$S'$和查询模式$P$的匹配。在本文中, 我们为此问题提出了一个叫做Cartesian- tree 位置 Heap( CPH) 的新索引结构。美元是输入文本字符串的长度$S$, 美元是查询模式$P$, 美元是搜索模式$S$( 美元) 美元, 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) + 美元( 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元( 美元) 美元) 美元( 美元) 美元( 美元) 美元) 美元( 美元) 美元( 美元) 美元) 美元( 美元) 美元( 美元( 美元) 美元( 美元) 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元) ( 美元)