The paper adopts parallel computing systems for predictive analysis in both CPU and GPU leveraging Spark Big Data platform. The traffic dataset is adopted to predict the traffic jams in Los Angeles County. It is collected from a popular platform in the USA for tracking information on the road using the device information and reports shared by the users. Large-scale traffic data set can be stored and processed using both GPU and CPU in this Scalable Big Data systems. The major contribution of this paper is to improve the performance of machine learning in distributed parallel computing systems with GPU to predict the traffic congestion. We show that the parallel computing can be achieve using both GPU and CPU with the existing Apache Spark platform. Our method can be applicable to other large scale datasets in different domains. The process modeling, as well as results, are interpreted using computing time and metrics: AUC, Precision and Recall. It should help the traffic management in Smart City.
翻译:本文采用了平行计算系统,用于在CPU和GPU杠杆点火大数据平台中进行预测分析。通过交通数据集,可以预测洛杉矶县的交通堵塞。从美国一个流行平台收集,以便利用用户共享的设备信息和报告跟踪道路信息。大型交通数据集可以在这个可缩放的大数据系统中使用GPU和CPU存储和处理。本文的主要贡献是改进与GPU平行分布式计算机系统机学习的性能,以预测交通堵塞。我们显示平行计算可以使用现有的阿帕奇点火热平台的GPU和CPU实现。我们的方法可以适用于不同领域的其它大型数据集。流程模型和结果可以使用计算时间和尺度来解释:AUC、精密度和检索。这应该有助于智能市的交通管理。