The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions. Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse.
翻译:快速采用生成语言模型带来了数字通信方面的重大进展,同时也引发了有关潜在滥用 AI 生成内容的担忧。虽然已经提出了许多用于区别 AI 和人类生成内容的检测方法,但这些检测器的公平性和鲁棒性仍未得到全面探究。在本研究中,我们使用来自英语母语和非英语母语写手的写作样本,评估了几种广泛使用的 GPT 检测器的性能。我们的研究结果表明,这些检测器不断地将非英语母语写作样本错误地分类为 AI 生成,而母语写作样本可以被准确地识别。此外,我们证明了简单的提示策略不仅可以缓解这种偏差,而且可以有效地绕过 GPT 检测器,这表明 GPT 检测器可能无意识地对具有有限语言表达能力的作家进行惩罚。我们的研究结果呼吁进行更广泛的讨论,探讨 ChatGPT 内容检测器的伦理影响,并警告在评估性或教育性环境中使用这些检测器时可能会无意间惩罚或排除非英语母语写手参与全球讨论。