在持久性字符串和片段选择中随机访问 (Random Access in Persistent Strings and Segment Selection)

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of an edited document or representing highly-repetitive collections. Given a tree with $n$ nodes, we show how to represent the corresponding collection in $O(n)$ space and $O(\log n/ \log \log n)$ query time. This improves the previous time-space trade-offs for the problem. Additionally, we show a lower bound proving that the query time is optimal for any solution using near-linear space. To achieve our bounds for random access in persistent strings we show how to reduce the problem to the following natural geometric selection problem on line segments. Consider a set of horizontal line segments in the plane. Given parameters $i$ and $j$, a segment selection query returns the $j$th smallest segment (the segment with the $j$th smallest $y$-coordinate) among the segments crossing the vertical line through $x$-coordinate $i$. The segment selection problem is to preprocess a set of horizontal line segments into a compact data structure that supports fast segment selection queries. We present a solution that uses $O(n)$ space and support segment selection queries in $O(\log n/ \log \log n)$ time, where $n$ is the number of segments. Furthermore, we prove that that this query time is also optimal for any solution using near-linear space.

翻译：我们考虑相似字符串收藏的缩略图, 支持随机访问查询。字符串的收集由根树提供, 树上边缘由编辑操作标记( 插入、删除或替换字符), 节点代表从根到节点的路径上应用编辑操作序列获得的字符串。目标是在支持快速随机访问某个字符串中的任何部分的同时, 缩略地代表整个收藏。这个问题包含自然情景, 比如代表过去编辑文档的历史或高重复性收藏。在树上标有 $ 的节点, 我们展示如何在$ (n) 的空间和$ (log n/log\log\log\log n) 查询中代表相应的收藏。这样可以改善之前的时间空间交易的偏移。此外, 我们显示一个更低的框点证明, 使用近线间空间的任意访问权限, 我们也可以显示如何将问题降低到这个直线段的自然几何度选择问题 $ 。将一个最小的平流部分用于美元平流部分。