A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
翻译:正则路径查询(RPQ)是一个正则表达式 q,它返回与标记为 L(q) 中的单词相关联的任意路径连接的所有节点对(u,v)来自图数据库。明显的算法方法是 RPQ 评估(称为 PG 方法),即在 q 的 NFA 和图数据库之间构造产品图,这是由于其简单性而有吸引力,也导致高效算法。但是,PG 方法是否最优并不清楚。我们通过彻底调查 PG 方法可以实现哪些上限复杂性限制以及在细粒度复杂性框架的条件下提供条件下限来解决这个问题。特别关注枚举和延迟界限,以及数据复杂性方面。一个主要见解是我们可以使用 PG 方法实现最优(或近似最优)的算法,但枚举的延迟相当高(与数据库线性相关)。我们探索了三种成功的枚举方法,其中包括超线性预处理,解决方案集的近似以及受限的 RPQ 类。