An increasing number of organisations in almost all fields have started adopting semantic web technologies for publishing their data as open, linked and interoperable (RDF) datasets, queryable through the SPARQL language and protocol. Link traversal has emerged as a SPARQL query processing method that exploits the Linked Data principles and the dynamic nature of the Web to dynamically discover data relevant for answering a query by resolving online resources (URIs) during query evaluation. However, the execution time of link traversal queries can become prohibitively high for certain query types due to the high number of resources that need to be accessed during query execution. In this paper we propose and evaluate baseline methods for estimating the evaluation cost of link traversal queries. Such methods can be very useful for deciding on-the-fly the query execution strategy to follow for a given query, thereby reducing the load of a SPARQL endpoint and increasing the overall reliability of the query service. To evaluate the performance of the proposed methods, we have created (and make publicly available) a ground truth dataset consisting of 2,425 queries.
翻译:几乎所有领域的越来越多的组织开始采用语义网络技术公布数据,作为开放、链接和互操作(RDF)的数据集,通过SPARQL语言和协议进行查询。链接曲解已经成为一种SPARQL查询处理方法,利用链接数据原则和网络的动态性质,通过在查询评价中解决在线资源(URIs),动态发现与回答查询相关的数据。然而,对于某些查询类型而言,使用链接的查询时间可能变得令人望而却步,因为查询执行期间需要访问的资源数量很大。我们在本文件中提出和评估估算链接查询评估成本的基线方法。这些方法对于在瞬间决定特定查询的查询执行战略非常有用,从而减少SPARQL端点的负荷,提高查询服务的总体可靠性。为了评估拟议方法的绩效,我们创建(并公开提供)了一个由2 425个查询组成的地面真相数据集。