Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In the last decade, the task of extracting data from nested documents over streams has become especially relevant. We focus on the streaming evaluation of queries with outputs of varied sizes over nested documents. We model queries of this kind as Visibly Pushdown Transducers (VPT), a computational model that extends visibly pushdown automata with outputs and has the same expressive power as MSO over nested documents. Since processing a document through a VPT can generate a massive number of results, we are interested in reading the input in a streaming fashion and enumerating the outputs one after another as efficiently as possible, namely, with constant-delay. This paper presents an algorithm that enumerates these elements with constant-delay after processing the document stream in a single pass. Furthermore, we show that this algorithm is worst-case optimal in terms of update-time per symbol and memory usage.
翻译:在线使用的一些最相关的文档模型,例如 XML 和 JSON, 具有嵌套格式。 在过去的十年中, 从串流的嵌套文档中提取数据的任务变得特别相关。 我们注重对嵌套文档上的不同大小的查询结果进行流化评估。 我们将这类查询模拟为Visibly Push down Transporters (VPT), 这是一种计算模型, 将输出的明显推下自动数据扩展, 并在嵌套文档上具有与 MSO 相同的表达力。 由于通过 VPT 处理文档可以产生大量结果, 我们有兴趣以流流式方式阅读输入的内容, 并尽可能高效地对输出结果进行逐次统计, 即以恒定调。 本文呈现一种算法, 将这些元素在单条处理文档流后以恒定跳键进行计算。 此外, 我们显示, 从每个符号的更新时间和记忆使用角度来说, 这种算法是最差的。