RDD-Eclat: Spark RDD框架的电弧弧度平行化方法(限版) (RDD-Eclat: Approaches to Parallelize Eclat Algorithm on Spark RDD Framework (Extended Version))

Frequent itemset mining (FIM) is a highly computational and data intensive algorithm. Therefore, parallel and distributed FIM algorithms have been designed to process large volume of data in a reduced time. Recently, a number of FIM algorithms have been designed on Hadoop MapReduce, a distributed big data processing framework. But, due to heavy disk I/O, MapReduce is found to be inefficient for the highly iterative FIM algorithms. Therefore, Spark, a more efficient distributed data processing framework, has been developed with in-memory computation and resilient distributed dataset (RDD) features to support the iterative algorithms. On this framework, Apriori and FP-Growth based FIM algorithms have been designed on the Spark RDD framework, but Eclat-based algorithm has not been explored yet. In this paper, RDD-Eclat, a parallel Eclat algorithm on the Spark RDD framework is proposed with its five variants. The proposed algorithms are evaluated on the various benchmark datasets, and the experimental results show that RDD-Eclat outperforms the Spark-based Apriori by many times. Also, the experimental results show the scalability of the proposed algorithms on increasing the number of cores and size of the dataset.

翻译：经常项目开采(FIM)是一种高度计算和数据密集的算法。因此, 已经设计了平行和分布式的FIM算法, 以便在减少的时间里处理大量数据。最近, 在分布的大型数据处理框架Hadoop MapRduce上设计了一些基于FIM算法的FIM算法。但是,由于磁盘I/ O, MapReduce被认为对高迭接的FIM算法是低效的。因此, Spark是一个效率更高的分布式数据处理框架, 已经与模拟计算和弹性分布式数据集(RDDD)的功能一起开发, 以支持迭代算法。在这个框架里, 基于FIM算法的Ariori和FP-Growth算法已经设计在Spoint RDD框架中设计, 但基于Eclat的算法还没有被探索。在本文中, Spoint RDDD框架的平行的Eclat 算法与五个变式。提议的算法是用各种基准数据集进行评估的, 实验结果显示RDD- Elat 超越了基于许多时间的实验性 Apractalal 的数值。

相关内容

Spark

关注 51

Apache Spark 是专为大规模数据处理而设计的快速通用的计算引擎。Spark是UC Berkeley AMP lab (加州大学伯克利分校的AMP实验室)所开源的类Hadoop MapReduce的通用并行框架，Spark，拥有Hadoop MapReduce所具有的优点；但不同于MapReduce的是Job中间输出结果可以保存在内存中，从而不再需要读写HDFS，因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的MapReduce的算法。

【KDD2021】可扩展凝聚层次聚类

专知会员服务

15+阅读 · 2021年7月4日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】应用离散结构，568页pdf

专知会员服务

84+阅读 · 2021年5月4日

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日