Deep learning (DL) systems are increasingly applied to safety-critical domains such as autonomous driving cars. It is of significant importance to ensure the reliability and robustness of DL systems. Existing testing methodologies always fail to include rare inputs in the testing dataset and exhibit low neuron coverage. In this paper, we propose DLFuzz, the frst differential fuzzing testing framework to guide DL systems exposing incorrect behaviors. DLFuzz keeps minutely mutating the input to maximize the neuron coverage and the prediction difference between the original input and the mutated input, without manual labeling effort or cross-referencing oracles from other DL systems with the same functionality. We present empirical evaluations on two well-known datasets to demonstrate its efficiency. Compared with DeepXplore, the state-of-the-art DL whitebox testing framework, DLFuzz does not require extra efforts to find similar functional DL systems for cross-referencing check, but could generate 338.59% more adversarial inputs with 89.82% smaller perturbations, averagely obtain 2.86% higher neuron coverage, and save 20.11% time consumption.
翻译:深度学习( DL) 系统越来越多地应用到安全关键领域, 如自动驾驶汽车。 这对于确保 DL 系统的可靠性和稳健性非常重要。 现有的测试方法总是在测试数据集中不包含稀有的投入, 并且显示神经覆盖范围低。 在本文中, 我们提议 DLFuzz, 用于引导 DL 系统暴露不正确行为的Frst 差异模糊测试框架 。 DLFuzz 将输入的微小地突变, 以最大限度地扩大神经覆盖范围, 以及原输入和变异输入之间的预测差异。 没有手动标签, 或从具有相同功能的其他 DL 系统的交叉参照孔。 我们对两个众所周知的数据集进行实证评估, 以展示其效率。 与最先进的 DL 白框测试框架DeepXplore 相比, DLFuzz 不需要额外的努力来寻找类似的功能 DL 系统进行交叉参照检查, 但可以产生更388.59% 的对立投入, 且低89.82%, 平均获得2.86% 的神经覆盖范围, 节省20.11 时间 。