We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm which evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay which is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant-delay guarantee.
翻译:我们研究从压缩文档的查询中计算结果的问题。 我们用于压缩的模型是直线程序( SLP), 由生成单一字符串的无上下文语法来定义。 对于我们的查询, 我们使用名为“ 附加自动图” 的模型, 这是允许字母注释的常规自动图的扩展。 这个模型扩展了常规 Spanners 的概念, 因为它允许任意长的输出。 我们的主要结果是一个算法, 通过在预处理阶段( 需要SLP 的大小线性时间, 和自动图的大小的立方时间) 来计算输出线性延迟的所有结果( SLP ) 。 这是Schmid 和 Schweikardt 的结果的改进。 这个模型使用相同的预处理时间, 以延迟的方式进行计算, 因为它允许任意长长的文档的大小。 我们通过一个名为 Enumberable Contract 的连续数据结构来评估这种查询, 与在一定的限制下可以保证输出- 线性延迟。 这些结果意味着, 意味着在常规的内线性文件流流中, 和Sweweikkarard 的结果, 我们用一个不断的顺序的顺序的解算法, 将持续地, 复制到前的序号的顺序的序列结果, 我们使用一个比前的序序序号, 。