Background: Geospatial linked data brings into the scope of the Semantic Web and its technologies, a wealth of datasets that combine semantically-rich descriptions of resources with their geo-location. There are, however, various Semantic Web technologies where technical work is needed in order to achieve the full integration of geospatial data, and federated query processing is one of these technologies. Methods: In this paper, we explore the idea of annotating data sources with a bounding polygon that summarizes the spatial extent of the resources in each data source, and of using such a summary as an (additional) source selection criterion in order to reduce the set of sources that will be tested as potentially holding relevant data. We present our source selection method, and we discuss its correctness and implementation. Results: We evaluate the proposed source selection using three different types of summaries with different degrees of accuracy, against not using geospatial summaries. We use datasets and queries from a practical use case that combines crop-type data with water availability data for food security. The experimental results suggest that more complex summaries lead to slower source selection times, but also to more precise exclusion of unneeded sources. Moreover, we observe the source selection runtime is (partially or fully) recovered by shorter planning and execution runtimes. As a result, the federated sources are not burdened by pointless querying from the federation engine. Conclusions: The evaluation draws on data and queries from the agroenvironmental domain and shows that our source selection method substantially improves the effectiveness of federated GeoSPARQL query processing.
翻译:地理空间链接数据将大量数据集纳入语义网及其技术的范围,将精密的资源描述与地理位置相结合。然而,有多种语义网技术需要技术工作,以实现地理空间数据的全面整合,而联邦式查询处理是这些技术之一。方法:在本文件中,我们探讨用一个捆绑多边形来说明数据源的想法,该多边形将每个数据源的资源空间范围加以汇总,并使用这种摘要作为(附加)源选择标准,以减少将测试为可能持有相关数据的一组域源的有效性。我们介绍了我们的源选择方法,并讨论了其正确性和实施情况。结果:我们用三种不同程度的精度摘要来评价拟议的源选择,而不是使用地理空间摘要。我们使用数据集和查询实际使用的案例,将作物类型数据与水供应数据进行大量用于粮食安全。实验结果显示,更为复杂的摘要导致源选择时间的慢化,但同时也通过可能持有相关数据来测试。我们提出了源选择源选择的准确性,因此,我们没有用更精确的方式将数据源的排序。此外,我们用更精确地选择数据源进行不精确地排除。我们从时间选择的源进行。