Accurate cardinality estimates are a key ingredient to achieve optimal query plans. For RDF engines, specifically under common knowledge graph processing workloads, the lack of schema, correlated predicates, and various types of queries involving multiple joins, render cardinality estimation a particularly challenging task. In this paper, we develop a framework, termed LMKG, that adopts deep learning approaches for effectively estimating the cardinality of queries over RDF graphs. We employ both supervised (i.e., deep neural networks) and unsupervised (i.e., autoregressive models) approaches that adapt to the subgraph patterns and produce more accurate cardinality estimates. To feed the underlying data to the models, we put forward a novel encoding that represents the queries as subgraph patterns. Through extensive experiments on both real-world and synthetic datasets, we evaluate our models and show that they overall outperform the state-of-the-art approaches in terms of accuracy and execution time.
翻译:精确的基点估计是实现最佳查询计划的一个关键要素。对于RDF引擎来说,特别是在共同知识图表处理工作量下,缺乏系统、相关上游和涉及多个组合的各类查询,使基点估计成为一项特别具有挑战性的任务。在本文中,我们开发了一个称为LMKG的框架,采用深层次的学习方法来有效估计RDF图上查询的基点。我们采用既受监督(即深神经网络)又不受监督(即自动递增模型)的方法,适应子图模式并得出更准确的基点估计。为了向模型提供基本数据,我们提出了一个新的编码,作为子图样。通过对真实世界和合成数据集的广泛实验,我们评估了我们的模型,并表明这些模型在准确性和执行时间方面全面超越了最先进的方法。