NLPLego:将天然语言处理应用的试验产生组组 (NLPLego: Assembling Test Generation for Natural Language Processing Applications)

The development of modern NLP applications often relies on various benchmark datasets containing plenty of manually labeled tests to evaluate performance. While constructing datasets often costs many resources, the performance on the held-out data may not properly reflect their capability in real-world application scenarios and thus cause tremendous misunderstanding and monetary loss. To alleviate this problem, in this paper, we propose an automated test generation method for detecting erroneous behaviors of various NLP applications. Our method is designed based on the sentence parsing process of classic linguistics, and thus it is capable of assembling basic grammatical elements and adjuncts into a grammatically correct test with proper oracle information. We implement this method into NLPLego, which is designed to fully exploit the potential of seed sentences to automate the test generation. NLPLego disassembles the seed sentence into the template and adjuncts and then generates new sentences by assembling context-appropriate adjuncts with the template in a specific order. Unlike the taskspecific methods, the tests generated by NLPLego have derivation relations and different degrees of variation, which makes constructing appropriate metamorphic relations easier. Thus, NLPLego is general, meaning it can meet the testing requirements of various NLP applications. To validate NLPLego, we experiment with three common NLP tasks, identifying failures in four state-of-art models. Given seed tests from SQuAD 2.0, SST, and QQP, NLPLego successfully detects 1,732, 5301, and 261,879 incorrect behaviors with around 95.7% precision in three tasks, respectively.

翻译：现代 NLP 应用程序的开发往往依赖于各种基准数据集, 其中包括大量人工标记的测试来评估性能。在构建数据集时往往花费很多资源, 搁置数据的性能可能无法恰当地反映其在现实世界应用情景下的能力, 从而造成巨大误解和货币损失。为了缓解这一问题, 在本文件中, 我们提议了一种自动测试生成方法, 用于检测各种 NLP 应用程序的错误行为。我们的方法是根据经典语言的句子区分过程设计的, 因此它能够将基本的语法元素和辅助元素整合成一个有正确信息的语法正确测试。我们将这种方法应用到NLPL go 中, 旨在充分利用种子句的可能性将测试生成过程自动化。 NPLego 将种子句拆卸到模板和辅助器中, 然后通过将适合环境的附加点与模板相配, 与任务特定方法不同, NLPL 7 生成的测试, 和 NLPL 3 的测试将常规关系和不同程度。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日