Reliability is one of the major design criteria in Cyber-Physical Systems (CPSs). This is because of the existence of some critical applications in CPSs and their failure is catastrophic. Therefore, employing strong error detection and correction mechanisms in CPSs is inevitable. CPSs are composed of a variety of units, including sensors, networks, and microcontrollers. Each of these units is probable to be in a faulty state at any time and the occurred fault can result in erroneous output. The fault may cause the units of CPS to malfunction and eventually crash. Traditional fault-tolerant approaches include redundancy time, hardware, information, and/or software. However, these approaches impose significant overheads besides their low error coverage, which limits their applicability. In addition, the interval between error occurrence and detection is too long in these approaches. In this paper, based on Deep Reinforcement Learning (DRL), a new error detection approach is proposed that not only detects errors with high accuracy but also can perform error detection at the moment due to very low inference time. The proposed approach can categorize different types of errors from normal data and predict whether the system will fail. The evaluation results illustrate that the proposed approach has improved more than 2x in terms of accuracy and more than 5x in terms of inference time compared to other approaches.
翻译:可靠性是网络物理系统(CPS)的主要设计标准之一。这是因为CPS中存在一些关键应用,而且其故障是灾难性的。因此,在CPS中使用强烈的错误探测和校正机制是不可避免的。CPS由各种单位组成,包括传感器、网络和微控制器。这些单位在任何时候都有可能处于错误状态,发生错误可能导致错误产出。错误可能导致CPS单位发生故障并最终崩溃。传统的错误容忍方法包括冗余时间、硬件、信息和/或软件。然而,这些方法除了低误差覆盖范围之外,还造成大量间接费用,限制了其适用性。此外,在这种方法中,出错和探测之间的间隔过长。在本文中,根据深度强化学习(DRL),提出了新的错误探测方法,不仅能发现错误高度准确性,而且能够在极低的推算时间内发现错误。拟议的方法可以对正常数据、硬件、信息和/或软件进行分类不同种类的错误,并预测系统是否比其他方法更准确性要高。在5号方法中,评估的结果比其他方法的精确性要好。在2号中比其他方法改进。