Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves. Under consideration in Theory and Practice of Logic Programming (TPLP).
翻译:以 Traversal 为基础的 Traversal Query 处理 (ltqp) 链接 的 Traversal Query process (ltqp), 通过一个文件网络而不是单一的数据集来评价一个 sparql 查询, 通常被视为一种理论上有趣但不切实际的技术; 然而,在数据高度集中日益受到审查的时代,一个分散的数据网络具有一种简单的基于文件的界面,具有吸引力,因为它使数据出版商能够控制其数据和访问权; 虽然 ITqp 允许评价这类网络的复杂查询,但它有工作表现问题(由于载有数据的文件数量众多)以及信息质量问题(由于提供这类文件的来源很多),以及信息质量问题(由于提供这类文件的来源很多) 。 在现有的 ITqp 方法中,寻找查询来源的负担完全落在数据消费者手中。 在这份文件中,我们说,数据出版商也应该能够提出感兴趣的来源,并指导数据消费者使用相关和可信赖的数据。 我们引入一个理论框架框架,以便改进查询结果,减少网络要求的数量。 我们用一个实验性例子来评估数据质量,但是在虚拟地研究中, 也用虚拟地研究 。</s>