Recently, there has been a significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is deep learning testing, where adversarial examples (a.k.a.~bugs) of DL systems are found either by fuzzing or guided search with the help of certain testing metrics. However, recent studies have revealed that the commonly used neuron coverage metrics by existing DL testing approaches are not correlated to model robustness. It is also not an effective measurement on the confidence of the model robustness after testing. In this work, we address this gap by proposing a novel testing framework called Robustness-Oriented Testing (RobOT). A key part of RobOT is a quantitative measurement on 1) the value of each test case in improving model robustness (often via retraining), and 2) the convergence quality of the model robustness improvement. RobOT utilizes the proposed metric to automatically generate test cases valuable for improving model robustness. The proposed metric is also a strong indicator on how well robustness improvement has converged through testing. Experiments on multiple benchmark datasets confirm the effectiveness and efficiency of RobOT in improving DL model robustness, with 67.02% increase on the adversarial robustness that is 50.65% higher than the state-of-the-art work DeepGini.
翻译:最近,人们对于应用软件工程技术来保证深层次学习系统的质量保证的兴趣明显增加。一个流行的方向是深层次的学习测试,发现DL系统的对抗性实例(a.k.a.a. ~bugs)要么通过模糊或借助某些测试指标进行有指导的搜索发现DL系统的对抗性实例(a.k.a.a. ~bugs)。然而,最近的研究表明,现有DL测试方法中常用的神经覆盖度指标与模型的稳健性没有关系。它也不是对测试后模型稳健性的信心的有效衡量。在这项工作中,我们提出了一个名为 " 强健性测试 " 的新型测试框架(ROBOT)。 RobOT的一个关键部分是1)在提高模型稳健性(通常通过再培训)方面每个测试案例的价值和2)模型稳健性改进强性质量。RobOT利用拟议指标自动生成对改进模型稳性有价值的测试性案例。拟议的衡量标准也是关于通过测试使强性改进强性达到强度程度的强有力程度的有力指标。对多个基准数据集的实验证实了50.02-G在改进了强性强性BOBOT的升级性模型方面证实性(BD)。