Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful for the development of future cost-based federated SPARQL query processing engines.
翻译:找到一个良好的查询计划是优化查询运行时间的关键。 特别是对于成本型联邦引擎来说,这尤其关系到成本型联邦引擎,它们利用基本估计值来实现这一目标。 一些研究将SPARQL联邦引擎与不同的性能指标进行了比较,包括查询运行时间、结果设定的完整性和正确性、所选来源的数量和所发请求的数量。尽管信息丰富,但这些指标是通用的,无法量化和评价基于成本型联邦引擎的主要估计值的准确性。为了彻底评价成本型联邦引擎,必须测量估计主要误差对总体查询运行时间性能的影响。在本文件中,我们通过提出新的评价指标来应对这一挑战,指标的对象是基于成本的远端SPARQL查询引擎的精细比基准。我们利用大型RDF Bench查询对现有和新型评价指标进行了评估。我们的成果对实验结果进行了详细分析,揭示了新的洞察力,对开发基于成本的SPARQL查询引擎很有用。