As neural networks are increasingly included as core components of safety-critical systems, developing effective testing techniques specialized for them becomes crucial. The bulk of the research has focused on testing neural-network models (for instance, their robustness and reliability as classifiers). But neural-network models are defined by writing programs (usually written in a programming language like Python), and there is growing evidence that these neural-network programs often have bugs. Thus, being able to effectively test neural-network programs is instrumental to their dependability. This paper presents aNNoTest: an approach to generating test inputs for neural-network programs. A fundamental challenge is that the dynamically-typed languages used to program neural networks cannot express detailed constraints about valid function inputs. Without knowing these constraints, automated test-case generation is prone to producing many invalid inputs, which trigger spurious failures and are useless for identifying real bugs. To address this problem, we introduce a simple annotation language tailored for expressing valid function inputs in neural-network programs. aNNoTest inputs an annotated program, and uses property-based testing to generate random inputs that satisfy the validity constraints. In the paper, we also outline guidelines that help reduce the effort needed to write aNNoTest annotations. We evaluated aNNoTest on 19 neural-network programs from Islam et al.'s survey. aNNoTest automatically generated test inputs that revealed 94 bugs, including 63 bugs that the survey reported for these projects. These results suggest that aNNoTest can be a cost-effective approach to finding widespread bugs in neural-network programs.
翻译:由于神经网络日益被纳入安全临界系统的核心组成部分,因此开发有效的神经网络系统变得至关重要。大部分研究侧重于测试神经网络模型(例如,作为分类器的坚固性和可靠性)。但是神经网络模型则由写程序(通常以像Python这样的编程语言写成)来定义。越来越多的证据表明,这些神经网络程序往往有错误。因此,能够有效地测试神经网络程序对于其可靠性至关重要。本文展示了一个“NNoTest:为神经网络程序生成测试投入的方法 ” 。一个基本的挑战是,用于编程神经网络模型的动态型语言无法表达关于有效功能投入的详细限制。但是,在不了解这些限制的情况下,自动测试型网络模型的生成很容易产生许多无效的投入,这会引起虚假的失败,对识别真正的错误是毫无用处的。为了解决这个问题,我们引入了一种简单的注释语言,用于在神经网络程序中表达有效的功能投入。一个附加说明的程序,并且使用基于属性的测试来生成随机输入结果, 包括不透明的网络测试程序。 在不理解这些限制的情况下, 测试型测试型的生成一个文件,我们也可以用来评估一个基于内部测试程序。