改进与反反versarial Parpheration 任务连接的参数探测 (Improving Paraphrase Detection with the Adversarial Paraphrasing Task)

If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely on a sense of paraphrase based on word overlap and syntax. Can we teach them instead to identify paraphrases in a way that draws on the inferential properties of the sentences, and is not over-reliant on lexical and syntactic similarities of a sentence pair? We apply the adversarial paradigm to this question, and introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases. These sentence pairs can then be used both to test paraphrase identification models (which get barely random accuracy) and then improve their performance. To accelerate dataset generation, we explore automation of APT using T5, and show that the resulting dataset also improves accuracy. We discuss implications for paraphrase detection and release our dataset in the hope of making paraphrase detection models better able to detect sentence-level meaning equivalence.

翻译：如果两句含义相同,那么就应该认为两句在推论性质上是等同的,也就是说,每一句应文字包含另一句。然而,目前广泛使用的许多参数数据集依赖于基于词重叠和语法的推理语感。我们能否教它们以借鉴判决推论性质的方式识别引言语,而不是过度依赖对句的法理和同义性相似性?我们对这个问题适用对抗性范式,并采用新的对抗性数据集创建对抗性方法进行引言识别:Adversarial Paraphrasing任务(APT),该任务要求参与者产生语义等同(相互含意的意义上),但用词法和同用词法不同。这些对词可以用来测试参数识别模式(几乎不随机的准确性),然后改进其性能。为了加速数据设置,我们探索APT5的自动化,并显示由此产生的数据设置也提高了数据的精确度。我们讨论了对等义的检测意义,以便更准确地测测测测数据。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日