试验说明:增强以能力为基础的NLP测试框架 (TestAug: A Framework for Augmenting Capability-based NLP Tests)

from arxiv, Accepted by COLING 2022; Presentation Video: https://www.youtube.com/watch?v=X0p8J57qxeg; Website: https://guanqun-yang.github.io/testaug/; GitHub: https://github.com/guanqun-yang/testaug

The recently proposed capability-based NLP testing allows model developers to test the functional capabilities of NLP models, revealing functional failures that cannot be detected by the traditional heldout mechanism. However, existing work on capability-based testing requires extensive manual efforts and domain expertise in creating the test cases. In this paper, we investigate a low-cost approach for the test case generation by leveraging the GPT-3 engine. We further propose to use a classifier to remove the invalid outputs from GPT-3 and expand the outputs into templates to generate more test cases. Our experiments show that TestAug has three advantages over the existing work on behavioral testing: (1) TestAug can find more bugs than existing work; (2) The test cases in TestAug are more diverse; and (3) TestAug largely saves the manual efforts in creating the test suites. The code and data for TestAug can be found at our project website (https://guanqun-yang.github.io/testaug/) and GitHub (https://github.com/guanqun-yang/testaug).

翻译：最近提议的基于能力的NLP测试使模型开发者能够测试NLP模型的功能能力,揭示出传统抑制机制无法检测到的功能性故障。然而,基于能力测试的现有工作需要大量人工努力和创建测试案例的域域专长。在本文件中,我们利用GPT-3引擎对测试案例生成的低成本方法进行调查。我们进一步提议使用一个分类器从GPT-3中去除无效产出,并将产出扩展为模板,以生成更多的测试案例。我们的实验表明,TestAug比现有行为测试工作具有三个优势:(1)TestAug能够发现比现有工作更多的错误;(2)TestAug中的测试案例更为多样化;(3)TestAug的测试案例在很大程度上节省了创建测试套件的手工工作。TestAug的代码和数据可以在我们的项目网站(https://guanqun-yang.githuuu.io/stataug/)和GitHub(https://github.com/guanqun-yang/steaug)上。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日