FIRE: 边际计算机迁移的适应性强化学习框架 (FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations)

In edge computing, users' service profiles must be migrated in response to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so. Nevertheless, these frameworks do not consider occasional server failures, which although rare, can prevent the smooth and safe functioning of edge computing users' latency sensitive applications such as autonomous driving and real-time obstacle detection, because users' computing jobs can no longer be completed. As these failures occur at a low probability, it is difficult for RL algorithms, which are inherently data-driven, to learn an optimal service migration solution for both the typical and rare event scenarios. Therefore, we introduce a rare events adaptive resilience framework FIRE, which integrates importance sampling into reinforcement learning to place backup services. We sample rare events at a rate proportional to their contribution to the value function, to learn an optimal policy. Our framework balances service migration trade-offs between delay and migration costs, with the costs of failure and the costs of backup placement and migration. We propose an importance sampling based Q-learning algorithm, and prove its boundedness and convergence to optimality. Following which we propose novel eligibility traces, linear function approximation and deep Q-learning versions of our algorithm to ensure it scales to real-world scenarios. We extend our framework to cater to users with different risk tolerances towards failure. Finally, we use trace driven experiments to show that our algorithm gives cost reductions in the event of failures.

翻译：在边缘计算中,用户的服务概况必须随着用户的流动性而迁移。强化学习(RL)框架已经为此提出。但是,这些框架并不考虑偶尔出现的服务器故障,尽管这种故障很少,但可以防止边端计算用户的潜伏敏感应用的顺利和安全运行,例如自主驾驶和实时障碍探测,因为用户的计算工作无法再完成。由于这些故障发生概率低,因此,由数据驱动的RL算法很难为典型和罕见事件情景学习最佳服务迁移解决方案。因此,我们引入了一个适应性适应性框架FIRE, 将重要抽样纳入强化学习以提供备份服务。我们根据其对价值功能的贡献,以一个最佳的政策,按比例抽样稀有的事件。我们的框架平衡了延迟成本和移徙成本之间的迁移权衡,同时成本和备份安置和移徙的成本。我们建议基于数据进行重要的抽样算法,并证明它与最佳性之间的界限和趋同。随后,我们提出了新的资格跟踪、线性功能近和深度驱动性逻辑框架。我们根据它们对价值作用,根据对用户的贡献,将我们真正的容忍性实验的失败推向世界风险的演算。我们向不同的演算法,我们向真正的容忍度框架,我们向了我们真正的演算法的演算。我们向向向向向世界风险的演算法的演进。我们向向了我们向不同的演算法的演进。我们向了我们向向向向世界的演算。