The great performance of machine learning algorithms and deep neural networks in several perception and control tasks is pushing the industry to adopt such technologies in safety-critical applications, as autonomous robots and self-driving vehicles. At present, however, several issues need to be solved to make deep learning methods more trustworthy, predictable, safe, and secure against adversarial attacks. Although several methods have been proposed to improve the trustworthiness of deep neural networks, most of them are tailored for specific classes of adversarial examples, hence failing to detect other corner cases or unsafe inputs that heavily deviate from the training samples. This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model robustness against different unsafe inputs. In particular, four coverage analysis methods are proposed and tested in the architecture for evaluating multiple detection logics. Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs, introducing limited extra-execution time and memory requirements.
翻译:机器学习算法和深神经网络在几种认知和控制任务方面的出色表现正在促使该行业在安全关键应用中采用这类技术,如自主机器人和自驾驶车,但目前需要解决若干问题,使深层次学习方法更加可信、可预测、安全和有保障,免受对抗性攻击,虽然已提出若干方法来提高深层神经网络的可信赖性,但大多数方法都是针对特定类别的对抗性实例而设计的,因此无法发现与培训样本严重偏差的其他转角案例或不安全投入。本文提出一个基于覆盖范式的轻量级监测结构,以加强模型对不同不安全投入的稳健性。特别是,在评估多种探测逻辑的架构中提出并测试了四种覆盖性分析方法。实验结果表明,拟议的方法在发现强大的对抗性实例和分配性投入方面都是有效的,引入了有限的超执行时间和记忆要求。