As neural networks are increasingly included as core components of safety-critical systems, developing effective testing techniques specialized for them becomes crucial. The bulk of the research has focused on testing neural-network models; but these models are defined by writing programs, and there is growing evidence that these neural-network programs often have bugs too. This paper presents aNNoTest: an approach to generating test inputs for neural-network programs. A fundamental challenge is that the dynamically-typed languages (e.g., Python) commonly used to program neural networks cannot express detailed constraints about valid function inputs (e.g., matrices with certain dimensions). Without knowing these constraints, automated test-case generation is prone to producing invalid inputs, which trigger spurious failures and are useless for identifying real bugs. To address this problem, we introduce a simple annotation language tailored for concisely expressing valid function inputs in neural-network programs. aNNoTest takes as input an annotated program, and uses property-based testing to generate random inputs that satisfy the validity constraints. In the paper, we also outline guidelines that simplify writing aNNoTest annotations. We evaluated aNNoTest on 19 neural-network programs from Islam et al's survey., which we manually annotated following our guidelines -- producing 6 annotations per tested function on average. aNNoTest automatically generated test inputs that revealed 94 bugs, including 63 bugs that the survey reported for these projects. These results suggest that aNNoTest can be a valuable approach to finding widespread bugs in real-world neural-network programs.
翻译:由于神经网络日益被纳入安全临界系统的核心组成部分,因此开发专门为神经网络开发的有效测试技术变得至关重要。 大部分研究侧重于测试神经网络模型; 但是这些模型是由写程序界定的, 越来越多的证据表明这些神经网络程序也往往也有错误。 本文展示了《 不 试验:为神经网络程序生成测试投入的一种方法》。 一个根本性的挑战是, 用于编程神经网络的动态型语言( 例如, Python) 通常用于编程神经网络的动态型语言( 例如, Python) 无法表达对有效功能投入( 例如, 带有某些维度的神经矩阵)的详细限制。 在不理解这些限制的情况下, 自动测试案例生成容易产生无效的输入, 从而触发刺激性错误的失败, 并且对于识别真正的错误也毫无用处。 为了解决这个问题, 我们引入了一种简单的说明语言, 用于简明表达神经网络程序, 并且使用基于财产的测试方法来生成满足有效性限制的随机输入。 在纸质文件中, 我们还列出一些指南, 简化了编写“ 不 测试” 测试程序, 并报告“ 不 记录” 。 我们评估了“ 测试了“ 测试” 测试了“ 的“ ” 系统” 显示” 的“ 。