Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed for deepfake text detection. However, we lack a thorough understanding of their real-world applicability. In this paper, we collect deepfake text from 4 online services powered by Transformer-based tools to evaluate the generalization ability of the defenses on content in the wild. We develop several low-cost adversarial attacks, and investigate the robustness of existing defenses against an adaptive attacker. We find that many defenses show significant degradation in performance under our evaluation scenarios compared to their original claimed performance. Our evaluation shows that tapping into the semantic information in the text content is a promising approach for improving the robustness and generalization performance of deepfake text detection schemes.
翻译:语言的基因化模型最近的进展使得能够产生令人信服的合成文本或深假文本。先前的工作表明,有可能滥用深假文本误导内容消费者。因此,深假文本探测,即区分人类和机器产生的文本的任务,正在变得日益重要。为深假文本检测提出了若干防守建议。然而,我们对其真实世界的适用性缺乏透彻的理解。在这份文件中,我们收集了四个在线服务的深假文本,这些在线服务由基于变异器的工具驱动,用来评估野生内容的防御的普及能力。我们开发了数起低成本对抗性攻击,并调查了现有防御对适应性攻击者的强健性。我们发现,许多防御在评估情景下的表现与最初声称的绩效相比明显退化。我们的评估表明,利用文本内容中的语义信息是改进深假文本检测计划的稳健性和普遍性表现的一个很有希望的方法。