The rise of language models such as BERT allows for high-quality text paraphrasing. This is a problem to academic integrity, as it is difficult to differentiate between original and machine-generated content. We propose a benchmark consisting of paraphrased articles using recent language models relying on the Transformer architecture. Our contribution fosters future research of paraphrase detection systems as it offers a large collection of aligned original and paraphrased documents, a study regarding its structure, classification experiments with state-of-the-art systems, and we make our findings publicly available.
翻译:BERT等语言模型的兴起使得高质量的文本翻番成为可能。这是学术完整性的一个问题,因为很难区分原始内容和机器生成的内容。我们提出了一个基准,包括使用依靠变换器结构的最近语言模型的原话文章。我们的贡献促进了对变换式探测系统的未来研究,因为它提供了大量经调整的原始和原话文件,一项有关其结构的研究,一项与最先进的系统进行的分类实验,我们公布了我们的调查结果。