The proliferation of demanding applications and edge computing establishes the need for an efficient management of the underlying computing infrastructures, urging the providers to rethink their operational methods. In this paper, we propose an Intelligent Proactive Fault Tolerance (IPFT) method that leverages the edge resource usage predictions through Recurrent Neural Networks (RNN). More specifically, we focus on the process-faults, which are related with the inability of the infrastructure to provide Quality of Service (QoS) in acceptable ranges due to the lack of processing power. In order to tackle this challenge we propose a composite deep learning architecture that predicts the resource usage metrics of the edge nodes and triggers proactive node replications and task migration. Taking also into consideration that the edge computing infrastructure is also highly dynamic and heterogeneous, we propose an innovative Hybrid Bayesian Evolution Strategy (HBES) algorithm for automated adaptation of the resource usage models. The proposed resource usage prediction mechanism has been experimentally evaluated and compared with other state of the art methods with significant improvements in terms of Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Additionally, the IPFT mechanism that leverages the resource usage predictions has been evaluated in an extensive simulation in CloudSim Plus and the results show significant improvement compared to the reactive fault tolerance method in terms of reliability and maintainability.
翻译:要求高的应用程序和边际计算激增,证明需要高效率地管理基本计算基础设施,敦促供应商重新思考其操作方法。在本文件中,我们建议采用智能型主动防错容忍(IPFT)方法,通过经常神经网络(RNN)利用边端资源使用情况预测;更具体地说,我们注重过程错误,因为由于缺乏处理能力,基础设施无法在可接受的范围内提供优质服务(QOS),因此无法提供可接受的服务。为了应对这一挑战,我们提议了一个综合深层次学习结构,预测边缘节点的资源使用情况指标,并触发主动的节点复制和任务迁移。考虑到边端计算基础设施也是动态和差异性很强的,我们提议采用创新的混合海湾进化战略(HBES)算法,对资源使用模式进行自动调整。拟议的资源使用预测机制已经进行了实验性评估,与其他最先进的方法相比,在“根中平方错误”和“中绝对错误”(MAE)方面有了显著的改进。此外,GIPTFAT机制还利用了高度的可变性预测方法,在“可变性模型”中评估了资源使用情况。