A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
翻译:常规路径查询( RPQ) 是一个常规表达式 q, 该表达式返回所有节点配对( u, v), 以来自 L( q) 的单词标注的任意路径连接的图形数据库中的所有节点配对( u, v) 。 对 RPQ 评估的明显算法方法( 称为 PG- aproach ), 即为 q 构建 NFA 和 图形数据库之间的产品图表, 因其简单性而具有吸引力, 并导致高效的算法。 但是, PG- aproach 的算法是否最佳, 尚不清楚。 我们通过彻底调查PG- aproach 能够达到哪些高复杂度界限, 我们用有条件的较低界限( 细微复杂度框架的意义上) 来补充这一问题。 一个特别的焦点是查点和延迟界限, 以及数据复杂性的视角是: 我们可以用 PG- aproach 来达到最佳( 接近) 算法, 但计时的延迟率相当高( 数据库中的线性 ) 。 我们探索了三种成功的方法, 以亚线延迟延迟延迟的解算法 : : Q 的超级RPPP 的预处理和近近。