With the rapid development of big data technologies, how to dig out useful information from massive data becomes an essential problem. However, using machine learning algorithms to analyze large data may be time-consuming and inefficient on the traditional single machine. To solve these problems, this paper has made some research on the parallelization of several classic machine learning algorithms respectively on the single machine and the big data platform Spark. We compare the runtime and efficiency of traditional machine learning algorithms with parallelized machine learning algorithms respectively on the single machine and Spark platform. The research results have shown significant improvement in runtime and efficiency of parallelized machine learning algorithms.
翻译:随着大数据技术的快速发展,如何从海量的数据中挖掘出有用的信息成为一个重要的问题。然而,在传统的单机上使用机器学习算法分析大数据可能会耗时且效率低下。为解决这些问题,本文分别研究了几种经典机器学习算法在单机和大数据平台Spark上的并行化方式。我们比较了传统机器学习算法与并行化机器学习算法在单机和Spark平台上的运行时间和效率。研究结果表明,并行化的机器学习算法在运行时间和效率方面得到了显著的改进。