The heterogeneous edge-cloud computing paradigm can provide a more optimal direction to deploy scientific workflows than traditional distributed computing or cloud computing environments. Due to the different sizes of scientific datasets and some of these datasets must keep private, it is still a difficult problem to finding an data placement strategy that can minimize data transmission as well as placement cost. To address this issue, this paper combines advantages of both edge and cloud computing to construct a data placement model, which can balance data transfer time and data placement cost using intelligent computation. The most difficult research challenge the model solved is to consider many constrain in this hybrid computing environments, which including shared datasets within individual and among multiple workflows across various geographical regions. According to the constructed model, the study propose a new data placement strategy named DE-DPSO-DPS, which using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO-DPA) to distribute these scientific datasets. The strategy also not only consider the characteristics such as the number and storage capacity of edge micro-datacenters, the bandwidth between different datacenters and the proportion of private datasets, but also analysis the performance of algorithm during the workflows execution. Comprehensive experiments are designed in simulated heterogeneous edge-cloud computing environments demonstrate that the data placement strategy can effectively reduce the data transmission time and placement cost as compared to traditional strategies for data-sharing scientific workflows.
翻译:与传统的分布式计算或云计算环境相比,多元边球计算模式可以为部署科学工作流程提供比传统分布式计算或云计算环境更优化的方向。由于科学数据集和某些这类数据集的大小不同,必须保持隐私,因此找到一种数据放置战略,最大限度地减少数据传输和放置成本,仍然是一个困难的问题。为解决这一问题,本文件结合了边缘和云计算的好处,以建立一个数据放置模型,该模型可以使用智能计算来平衡数据传输时间和数据放置成本。解决的最困难研究挑战是考虑混合计算环境中的许多限制因素,其中包括个人内部和不同地理区域多个工作流程之间的共享数据集。根据所建模型,研究提出了一个新的数据放置战略,名为DE-DPSO-DPSO-DPS, 该战略使用离散粒子热优化算法,与差异演化(DE-DPSO-DPA)相结合,以传播这些科学数据集。这个战略不仅考虑到边缘微数据中心的数量和存储能力、不同数据中心之间的带宽带和私人数据集的比例。根据所建的模型,还分析了在数据传输模式上设计的数据定位战略中,将数据定位模型定位为数据传输和数据传输的模型分析。