DNN 测试的重新研究新神经覆盖:图层和分布软件标准 (Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion)

from arxiv, The extended version of a paper to appear in the Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, 2023, (ICSE '23), 14 pages

Various deep neural network (DNN) coverage criteria have been proposed to assess DNN test inputs and steer input mutations. The coverage is characterized via neurons having certain outputs, or the discrepancy between neuron outputs. Nevertheless, recent research indicates that neuron coverage criteria show little correlation with test suite quality. In general, DNNs approximate distributions, by incorporating hierarchical layers, to make predictions for inputs. Thus, we champion to deduce DNN behaviors based on its approximated distributions from a layer perspective. A test suite should be assessed using its induced layer output distributions. Accordingly, to fully examine DNN behaviors, input mutation should be directed toward diversifying the approximated distributions. This paper summarizes eight design requirements for DNN coverage criteria, taking into account distribution properties and practical concerns. We then propose a new criterion, NeuraL Coverage (NLC), that satisfies all design requirements. NLC treats a single DNN layer as the basic computational unit (rather than a single neuron) and captures four critical properties of neuron output distributions. Thus, NLC accurately describes how DNNs comprehend inputs via approximated distributions. We demonstrate that NLC is significantly correlated with the diversity of a test suite across a number of tasks (classification and generation) and data formats (image and text). Its capacity to discover DNN prediction errors is promising. Test input mutation guided by NLC results in a greater quality and diversity of exposed erroneous behaviors.

翻译：提出了各种深度神经网络(DNN)覆盖标准,以评估DNN测试输入量和引导输入突变。覆盖范围的特征是通过具有某些产出的神经神经,或神经输出量之间的差异。然而,最近的研究表明,神经覆盖标准与测试套装质量没有什么关联。一般而言,DNN的分布近似,通过纳入等级层次来预测输入量。因此,我们主张从层的角度从其大致分布来推算DNNN的行为。应当使用其诱导的层产出分布来评估测试套件。因此,为充分检查DNNN行为,投入突变的方向应该是使近似分布多样化。本文概述了DNNN的覆盖标准的八项设计要求,同时考虑到分布特性和实际关切。我们随后提出了一个新的标准,即NeuraLC覆盖范围(NLC),以满足所有设计要求。因此,我们把一个单一的DNNNNNNN的层作为基本计算单位(而不是一个单一神经元),并捕捉到神经输出分布的四个关键特性。因此,NLC将DNNNNNNNNNN通过近似质量的预测能力来解释如何理解其输入量,并测试其生成的模型。我们通过一个有更精确的版本的版本的版本的版本。我们展示的版本的版本的图像的图像的样本,我们用测测测算。