Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ is less than $e$. Further, we present almost matching lower bounds on the sensitivity of CDAWGs for all cases of insertion, deletion, and substitution.
翻译:紧凑的有向无环词图(CDAWG)是一种基本数据结构,可在文本模式搜索、数据压缩和模式发现等领域应用于字符串。直观地说,字符串$T$的CDAWG是通过合并后缀树[Weiner 1973] 的同构子树而获得的,因此CDAWG是一种紧凑的索引结构。在本文中,我们研究了CDAWG的敏感性,当在输入字符串$T$的左端执行单个字符编辑操作(插入、删除或替换)时,即我们对左端编辑操作后CDAWG大小的最坏情况感兴趣。我们证明如果$e$是字符串$T$的CDAWG的边数,则在$T$的左端编辑操作后添加到CDAWG中的新边数小于$e$。此外,我们提出了与插入、删除和替换的所有情况的CDAWG敏感性的几乎匹配的下界。