Distributed data processing platforms (e.g., Hadoop, Spark, and Flink) are widely used to distribute the storage and processing of data among computing nodes of a cloud. The centralization of cloud resources has given birth to edge computing, which enables the processing of data closer to the data source instead of sending it to the cloud. However, due to resource constraints such as energy limitations, edge computing cannot be used for deploying all kinds of applications. Therefore, tasks are offloaded from an edge device to the more resourceful cloud. Previous research has evaluated the energy consumption of the distributed data processing platforms in the isolated cloud and edge environments. However, there is a paucity of research on evaluating the energy consumption of these platforms in an integrated edge-cloud environment, where tasks are offloaded from a resource-constraint device to a resource-rich device. Therefore, in this paper, we first present a framework for the energy-aware evaluation of the distributed data processing platforms. We then leverage the proposed framework to evaluate the energy consumption of the three most widely used platforms (i.e., Hadoop, Spark, and Flink) in an integrated edge-cloud environment consisting of Raspberry Pi, edge node, edge server node, private cloud, and public cloud. Our evaluation reveals that (i) Flink is most energy-efficient followed by Spark and Hadoop is found least energy-efficient (ii) offloading tasks from resource-constraint to resource-rich devices reduces energy consumption by 55.2%, and (iii) bandwidth and distance between client and server are found key factors impacting the energy consumption.
翻译:分布式数据处理平台(例如,Hadoop、Spark和Flink)被广泛用于在云层的计算节点中分配数据存储和处理数据。云源的集中使得在云层资源的中央化产生了边缘计算,使得数据处理能够更接近数据源,而不是将数据发送到云层。然而,由于能源限制等资源限制,边缘计算不能用于部署各种应用。因此,任务从边缘设备卸载到更有资源的云层。以前的研究已经评估了分散式数据处理平台在孤立的云层和边缘环境中的距离偏差的能源消耗量。然而,对于在55度边缘(cloud)环境下评价这些平台的能源消耗量的研究很缺乏,而任务则从资源紧紧靠数据源,而不是将数据输入到资源丰富的设备。因此,我们首先提出了一个对分布式数据处理平台进行能源认知评价的框架。我们随后利用拟议的框架来评估三个最广泛使用的平台(例如,Hatoop、Spark、以及Flink)的能源消耗量,在中央一级(例如,Haboopy、Si-rede) 和Flinke-li-li-li-real 的能源环境中,一个综合平流的能源环境中,由Si-ri-ri-re-re-li-re-rele-de 找到了发现的能源环境,由Si-relopperview的能源的能源环境,由Si-review的能源环境,由S-i-i-i-i-i-i-i-i-i-i-i-i-liev 和Firviolvial) 和Firview 。