Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP).
翻译:链接遍历式查询处理(link traversal-based query processing,LTQP)是一种在文档网络中而不是单一数据集中执行 SPARQL 查询的技术,通常被认为是理论上有趣但不实用的技术。然而,在数据的超级集中日益受到审视的时代,一个具有简单基于文档的接口的去中心化数据网络是具有吸引力的,它使数据发布者能够控制其数据和访问权限。尽管 LTQP 允许在这样的网络上评估复杂的查询,但它存在性能问题(由于包含数据的文档数量很高)以及信息质量的问题(由于提供此类文档的许多来源)。在现有的 LTQP 方法中,找到查询源的负担完全在数据使用者手中。在本文中,我们认为为解决这些问题,数据发布者也应该能够建议感兴趣的源,并引导数据使用者寻找相关和可信赖的数据。我们介绍了一个理论框架,可以实现这种引导式链接遍历,并研究了它的属性。我们通过理论示例说明,这可以改善查询结果并减少网络请求的数量。我们在具有规范的虚拟链接网络上实验性地评估了我们的提案,确实观察到查询的数据质量和效率均得到了提高。此论文正在考虑《逻辑编程的理论与实践》(Theory and Practice of Logic Programming,TPLP)。