The recent success of large language models for text generation poses a severe threat to academic integrity, as plagiarists can generate realistic paraphrases indistinguishable from original work. However, the role of large autoregressive transformers in generating machine-paraphrased plagiarism and their detection is still developing in the literature. This work explores T5 and GPT-3 for machine-paraphrase generation on scientific articles from arXiv, student theses, and Wikipedia. We evaluate the detection performance of six automated solutions and one commercial plagiarism detection software and perform a human study with 105 participants regarding their detection performance and the quality of generated examples. Our results suggest that large models can rewrite text humans have difficulty identifying as machine-paraphrased (53% mean acc.). Human experts rate the quality of paraphrases generated by GPT-3 as high as original texts (clarity 4.0/5, fluency 4.2/5, coherence 3.8/5). The best-performing detection model (GPT-3) achieves a 66% F1-score in detecting paraphrases.
翻译:最近,大量文本生成语言模型的成功对学术完整性构成严重威胁,因为授标者能够产生与原始作品无法区分的切合实际的副词句。然而,大型自动递减变异器在产生机器自译自译自审的损耗及其探测方面的作用仍在文献中不断发展。这项工作探索了T5和GPT-3,用于在ArXiv、学生论文和维基百科的科学文章上产生机器自译自译自译自审的文字。我们评估了6个自动解决方案和1个商业塑胶检测软件的探测性能,并与105名参与者进行了有关其探测性能和生成实例质量的人类研究。我们的研究结果表明,大型模型可以将人类的文本改写为机器自译(53%表示 acc.)。人类专家将GPT-3生成的副词的质量评为原始文本的高(carity 4.0/5,流利特4.2/5,一致性3.8/5)。最佳探测模型(GPTT-3)在发现副词句中达到了66%的F1芯。