The growing use of deep neural networks (DNNs) in safety- and security-critical areas like autonomous driving raises the need for their systematic testing. Coverage-guided testing (CGT) is an approach that applies mutation or fuzzing according to a predefined coverage metric to find inputs that cause misbehavior. With the introduction of a neuron coverage metric, CGT has also recently been applied to DNNs. In this work, we apply CGT to the task of person detection in crowded scenes. The proposed pipeline uses YOLOv3 for person detection and includes finding DNN bugs via sampling and mutation, and subsequent DNN retraining on the updated training set. To be a bug, we require a mutated image to cause a significant performance drop compared to a clean input. In accordance with the CGT, we also consider an additional requirement of increased coverage in the bug definition. In order to explore several types of robustness, our approach includes natural image transformations, corruptions, and adversarial examples generated with the Daedalus attack. The proposed framework has uncovered several thousand cases of incorrect DNN behavior. The relative change in mAP performance of the retrained models reached on average between 26.21\% and 64.24\% for different robustness types. However, we have found no evidence that the investigated coverage metrics can be advantageously used to improve robustness.
翻译:在安全和安保关键领域(如自主驾驶)越来越多地使用深神经网络(DNN),这提高了对系统测试的需要。覆盖制导测试(CGT)是一种根据预先定义的覆盖度标准应用突变或模糊的方法,以寻找导致行为失检的投入。随着引入神经覆盖度标准,CGT最近还适用于DNNs。在这项工作中,我们运用CGT来承担在拥挤的场景中探测人的工作。拟议管道使用YOLOv3来检测人,包括通过取样和突变发现DNN错误,以及随后在更新的训练组中进行DNN再培训。为了成为一个错误,我们需要一种变异的图像来导致与清洁的投入相比显著的性能下降。根据CGT,我们还考虑了增加错误定义覆盖面的额外要求。为了探索多种类型的稳健性,我们的方法包括自然图像转换、腐败和在Dadalus袭击中生成的对抗性实例。拟议框架发现了几千个不正确的DNN行为案例。我们所了解的DNNNB行为在64年的平均度和26-AP类型之间,我们所找到的稳性标准。