Recurrent neural networks (RNNs) have been applied to a broad range of applications, including natural language processing, drug discovery, and video recognition. Their vulnerability to input perturbation is also known. Aligning with a view from software defect detection, this paper aims to develop a coverage guided testing approach to systematically exploit the internal behaviour of RNNs, with the expectation that such testing can detect defects with high possibility. Technically, the long short term memory network (LSTM), a major class of RNNs, is thoroughly studied. A family of three test metrics are designed to quantify not only the values but also the temporal relations (including both step-wise and bounded-length) exhibited when LSTM processing inputs. A genetic algorithm is applied to efficiently generate test cases. The test metrics and test case generation algorithm are implemented into a tool TestRNN, which is then evaluated on a set of LSTM benchmarks. Experiments confirm that TestRNN has advantages over the state-of-art tool DeepStellar and attack-based defect detection methods, owing to its working with finer temporal semantics and the consideration of the naturalness of input perturbation. Furthermore, TestRNN enables meaningful information to be collected and exhibited for users to understand the testing results, which is an important step towards interpretable neural network testing.
翻译:经常神经网络(RNNs)应用到广泛的应用领域,包括自然语言处理、药物发现和视频识别。它们容易受到输入的干扰,这也是众所周知的。根据软件缺陷检测的观察,本文件旨在开发一种覆盖的、有指导的测试方法,系统利用RNs的内部行为,期望这种测试能够发现极有可能发现的缺陷。从技术上讲,对长期短期内存网络(LSTM)进行了彻底研究。由三种测试指标组成的系列不仅旨在量化值,而且还量化LSTM处理投入时显示的时间关系(包括步态和约束长度)。基因算法用于高效生成测试案例。测试指标和测试案例生成算法被应用到工具TestRNNN,然后根据一套LSTM基准对其进行评估。实验证实,TestRNNN(长期内存网)比最先进的工具 " DeepStellar " 和以攻击为基础的缺陷检测方法更有利,因为其工作是精细的时文定调,并且考虑到自然特性,使输入的用户能够真正地进行测试。此外,测试使输入结果能够进行有意义的测试。