Access plan recommendation is a query optimization approach that executes new queries using prior created query execution plans (QEPs). The query optimizer divides the query space into clusters in the mentioned method. However, traditional clustering algorithms take a significant amount of execution time for clustering such large datasets. The MapReduce distributed computing model provides efficient solutions for storing and processing vast quantities of data. Apache Spark and Apache Hadoop frameworks are used in the present investigation to cluster different sizes of query datasets in the MapReduce-based access plan recommendation method. The performance evaluation is performed based on execution time. The results of the experiments demonstrated the effectiveness of parallel query clustering in achieving high scalability. Furthermore, Apache Spark achieved better performance than Apache Hadoop, reaching an average speedup of 2x.
翻译:访问计划建议是一种查询优化方法,它使用先前创建的查询执行计划执行新的查询。查询优化程序将查询空间分成上述方法中的组群。然而,传统的群集算法需要相当长的执行时间来组合如此庞大的数据集。地图显示分布式计算模型为储存和处理大量数据提供了有效的解决办法。在本次调查中,Apache Spark和Apache Hadoop框架用于在基于地图的访问计划建议方法中将不同规模的查询数据集分组。绩效评估是根据执行时间进行的。实验结果显示平行的群集在达到高可缩性方面的有效性。此外,Apache Spark比Apache Hadoop取得较好的性能,平均速度达到2x。