用于深学习生成无效输入的测试生成器:经验研究 (When and Why Test Generators for Deep Learning Produce Invalid Inputs: an Empirical Study)

Testing Deep Learning (DL) based systems inherently requires large and representative test sets to evaluate whether DL systems generalise beyond their training datasets. Diverse Test Input Generators (TIGs) have been proposed to produce artificial inputs that expose issues of the DL systems by triggering misbehaviours. Unfortunately, such generated inputs may be invalid, i.e., not recognisable as part of the input domain, thus providing an unreliable quality assessment. Automated validators can ease the burden of manually checking the validity of inputs for human testers, although input validity is a concept difficult to formalise and, thus, automate. In this paper, we investigate to what extent TIGs can generate valid inputs, according to both automated and human validators. We conduct a large empirical study, involving 2 different automated validators, 220 human assessors, 5 different TIGs and 3 classification tasks. Our results show that 84% artificially generated inputs are valid, according to automated validators, but their expected label is not always preserved. Automated validators reach a good consensus with humans (78% accuracy), but still have limitations when dealing with feature-rich datasets.

翻译：深学习测试( DL) 基于深学习( DL) 的系统本身就要求大型且具有代表性的测试组来评价 DL 系统是否在培训数据集之外泛泛化。不同测试输入生成器( TIG) 已经建议通过触发错误行为来产生暴露 DL 系统问题的人工输入。不幸的是, 这种生成的输入可能是无效的, 也就是说, 无法被辨别为输入域的一部分, 从而提供不可靠的质量评估。自动验证器可以减轻人工检查输入对人类测试器有效性的负担, 尽管输入有效性是一个难以正规化的概念, 因而是自动化的。在本文中, 我们根据自动化和人文验证器, 已经调查了 TIG 能够产生有效投入的程度。我们进行了一项大型的经验性研究, 涉及两个不同的自动验证器、 220 人文评估器、 5 不同的 TIG 和 3 分类任务。我们的结果表明, 84% 人工生成的投入根据自动验证器是有效的, 但预期的标签并非总能保存。自动验证器与人类达成良好共识( 78% ), 但处理地貌数据集时仍然有局限性。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日