异异源、边缘和居室计算机中的科学工作流:基于强化学习的数据安置战略 (Scientific Workflows in Heterogeneous Edge-Cloud Computing: A Data Placement Strategy Based on Reinforcement learning)

The heterogeneous edge-cloud computing paradigm can provide an optimal solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing environments. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning-based approach to learn the optimal strategy in the runtime stage of scientific workflows. Through simulating heterogeneous edge-cloud computing environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.

翻译：与云计算或其它传统的分布式计算环境相比,混杂的边球计算模式可以为部署科学工作流程提供一个最佳解决方案。由于科学数据集的不同规模和某些数据集的隐私问题,必须找到一个数据放置战略,以最大限度地减少数据传输时间。一些最先进的数据放置战略将边缘计算和云计算结合起来,以传播科学数据集。然而,新生成的离散粒子温度优化算法的动态分布,与适当的数据中心不同,在执行工作流程期间仍是一个挑战。为了应对这一挑战,本研究不仅构建了数据放置模型,其中包括个人内部和不同地理区域多个工作流程之间的共享数据集,而且还根据两个阶段的算法提出了数据放置战略(DYM-RL-DPS)。首先,在流程建设阶段,我们使用离散粒子温度优化算法,将初始数据集预先分配到适当的数据中心。然后,我们重新配置动态数据集分配问题,作为个人和不同地理区域的共享数据集配置模型,同时提出数据放置一个基于两个阶段的科学定位战略,以强化我们所设计的科学传输策略的学习模型。