The reliability of software that has a Deep Neural Network (DNN) as a component is urgently important today given the increasing number of critical applications being deployed with DNNs. The need for reliability raises a need for rigorous testing of the safety and trustworthiness of these systems. In the last few years, there have been a number of research efforts focused on testing DNNs. However the test generation techniques proposed so far lack a check to determine whether the test inputs they are generating are valid, and thus invalid inputs are produced. To illustrate this situation, we explored three recent DNN testing techniques. Using deep generative model based input validation, we show that all the three techniques generate significant number of invalid test inputs. We further analyzed the test coverage achieved by the test inputs generated by the DNN testing techniques and showed how invalid test inputs can falsely inflate test coverage metrics. To overcome the inclusion of invalid inputs in testing, we propose a technique to incorporate the valid input space of the DNN model under test in the test generation process. Our technique uses a deep generative model-based algorithm to generate only valid inputs. Results of our empirical studies show that our technique is effective in eliminating invalid tests and boosting the number of valid test inputs generated.
翻译:具有深神经网络(DNN)作为组成部分的软件的可靠性在今天具有紧迫的重要性,因为与DNN公司一起部署的关键应用数量越来越多,因此,现在迫切需要将DNN公司作为组成部分的软件的可靠性作为组成部分。由于需要可靠性,因此有必要对这些系统的安全和可靠性进行严格的测试。在过去几年里,已经进行了一些以测试DNN公司为重点的研究努力。然而,迄今提出的测试生成技术缺乏检查来确定它们产生的测试投入是否有效,从而产生了无效的投入。为了说明这种情况,我们探索了最近的三个DNN公司测试技术。我们利用基于输入的深重基因模型验证,表明所有这三种技术都产生了大量无效的测试投入。我们进一步分析了DNN公司测试技术产生的测试投入的测试范围,并表明无效的测试投入如何能错误地扩大测试范围指标。为了克服在测试过程中纳入无效输入的问题,我们提出了一种技术,在测试过程中将DNN模式的有效输入空间纳入测试过程。我们的技术使用了一种深基因化模型算法来产生有效的投入。我们的经验研究结果显示,我们的技术有效地消除了无效的测试,并提升了无效的测试数据。