Language generation models' democratization benefits many domains, from answering health-related questions to enhancing education by providing AI-driven tutoring services. However, language generation models' democratization also makes it easier to generate human-like text at-scale for nefarious activities, from spreading misinformation to targeting specific groups with hate speech. Thus, it is essential to understand how people interact with bots and develop methods to detect bot-generated text. This paper shows that bot-generated text detection methods are more robust across datasets and models if we use information about how people respond to it rather than using the bot's text directly. We also analyze linguistic alignment, providing insight into differences between human-human and human-bot conversations.
翻译:语言生成模式的民主化有益于许多领域,从回答与健康有关的问题到通过提供AI驱动的辅导服务加强教育。然而,语言生成模式的民主化也使得为邪恶活动大规模生成人文文本更加容易,从传播错误信息到针对特定群体发表仇恨言论,因此,必须理解人们如何与机器人互动,并开发检测机器人生成的文本的方法。本文表明,如果使用人们如何应对的相关信息,而不是直接使用机器人的文本,机器人生成的文本检测方法在数据集和模型之间更加健全。我们还分析语言一致性,提供对人类与人类和人类机器人对话差异的洞察。