As RDF becomes more widely established and the amount of linked data is rapidly increasing, the efficient querying of large amount of data becomes a significant challenge. In this paper, we propose a family of algorithms for querying large amount of linked data in a distributed manner. These query evaluation algorithms are independent of the way the data is stored, as well as of the particular implementation of the query evaluation. We then use the MapReduce paradigm to present a distributed implementation of these algorithms and experimentally evaluate them, although the algorithms could be straightforwardly translated into other distributed processing frameworks. We also investigate and propose multiple query decomposition approaches of Basic Graph Patterns (subclass of SPARQL queries) that are used to improve the overall performance of the distributed query answering. A deep analysis of the effectiveness of these decomposition algorithms is also provided.
翻译:随着RDF的建立更加广泛,链接数据的数量也在迅速增加,高效查询大量数据成为一项重大挑战。在本文中,我们建议采用一系列算法,以分布方式查询大量链接数据。这些查询评价算法独立于数据储存的方式以及查询评估的具体实施。然后我们使用MapRduce模式来展示这些算法的分散实施和实验性评估,尽管这些算法可以直接转化为其他分布式处理框架。我们还调查并提议使用基本图表模式(SPARQL查询的子类)的多重查询分解方法,用于改善分布式查询答复的总体性能。我们还提供了对这些分解算法的有效性的深入分析。