Neural network test cases are meant to exercise different reasoning paths in an architecture and used to validate the prediction outcomes. In this paper, we introduce "computational profiles" as vectors of neuron activation levels. We investigate the distribution of computational profile likelihood of metamorphic test cases with respect to the likelihood distributions of training, test and error control cases. We estimate the non-parametric probability densities of neuron activation levels for each distinct output class. Probabilities are inferred using training cases only, without any additional knowledge about metamorphic test cases. Experiments are performed by training a network on the MNIST Fashion library of images and comparing prediction likelihoods with those obtained from error control-data and from metamorphic test cases. Experimental results show that the distributions of computational profile likelihood for training and test cases are somehow similar, while the distribution of the random-noise control-data is always remarkably lower than the observed one for the training and testing sets. In contrast, metamorphic test cases show a prediction likelihood that lies in an extended range with respect to training, tests, and random noise. Moreover, the presented approach allows the independent assessment of different training classes and experiments to show that some of the classes are more sensitive to misclassifying metamorphic test cases than other classes. In conclusion, metamorphic test cases represent very aggressive tests for neural network architectures. Furthermore, since metamorphic test cases force a network to misclassify those inputs whose likelihood is similar to that of training cases, they could also be considered as adversarial attacks that evade defenses based on computational profile likelihood evaluation.
翻译:神经网络测试案例旨在在一个结构中运行不同的推理路径, 并用于验证预测结果。 在本文中, 我们引入了“ 计算剖面图” 作为神经神经活化水平的矢量。 我们调查了与培训、 测试和错误控制案例的可能分布有关的变形测试案例的计算剖面可能性分布情况; 我们估计了每个不同的产出类别神经神经激活水平的非参数概率密度。 仅使用培训案例即可推断概率, 而没有关于变形输入测试案例的任何额外知识。 实验是通过在MMIST图像时装库中培训一个网络来进行的, 以及比较从错误控制数据中和变形测试案例中获得的预测可能性。 实验结果显示, 计算剖面图的分布与培训和测试案例的分布有些相似, 而随机神经控制数据的分布总是比对培训和测试数据集的分布要低得多。 相反, 变形测试案例的基础测试案例显示, 在培训、 测试和随机噪音袭击方面, 也显示, 不同的测试类别, 也显示, 不同的测试类别 测试案例 。