While the Web of Data in principle offers access to a wide range of interlinked data, the architecture of the Semantic Web today relies mostly on the data providers to maintain access to their data through SPARQL endpoints. Several studies, however, have shown that such endpoints often experience downtime, meaning that the data they maintain becomes inaccessible. While decentralized systems based on Peer-to-Peer (P2P) technology have previously shown to increase the availability of knowledge graphs, even when a large proportion of the nodes fail, processing queries in such a setup can be an expensive task since data necessary to answer a single query might be distributed over multiple nodes. In this paper, we therefore propose an approach to optimizing SPARQL queries over decentralized knowledge graphs, called Lothbrok. While there are potentially many aspects to consider when optimizing such queries, we focus on three aspects: cardinality estimation, locality awareness, and data fragmentation. We empirically show that Lothbrok is able to achieve significantly faster query processing performance compared to the state of the art when processing challenging queries as well as when the network is under high load.
翻译:虽然数据网原则上可以访问各种相互关联的数据,但今天,语义网站的结构主要依靠数据提供者通过SPARQL端点保持数据访问。然而,一些研究显示,这些端点往往会经历故障时间,这意味着它们维持的数据变得无法进入。虽然以前基于Peper-Peer(P2P)技术的分散系统显示,即使大部分节点都失败了,在这种设置中处理查询可能是一项昂贵的任务,因为回答单一查询所需的数据可能通过多个节点传播。因此,在本文件中,我们提议了一种办法,优化SPARQL查询对分散的知识图的查询,称为Lothbrok。在优化这种查询时,我们可能要考虑许多方面,但我们侧重于三个方面:基点估计、地点认识和数据分散。我们的经验显示,Lothbrok在处理具有挑战性的查询时,以及网络负荷过重时,能够比艺术状态更快地取得查询处理工作业绩。