The rapid adoption of generative language models has brought about substantial advancements in digital communication, while simultaneously raising concerns regarding the potential misuse of AI-generated content. Although numerous detection methods have been proposed to differentiate between AI and human-generated content, the fairness and robustness of these detectors remain underexplored. In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers. Our findings reveal that these detectors consistently misclassify non-native English writing samples as AI-generated, whereas native writing samples are accurately identified. Furthermore, we demonstrate that simple prompting strategies can not only mitigate this bias but also effectively bypass GPT detectors, suggesting that GPT detectors may unintentionally penalize writers with constrained linguistic expressions. Our results call for a broader conversation about the ethical implications of deploying ChatGPT content detectors and caution against their use in evaluative or educational settings, particularly when they may inadvertently penalize or exclude non-native English speakers from the global discourse.
翻译:随着生成式语言模型的快速普及,数字通信的水平得到了大幅提高,同时也引发了人们对人工智能生成内容潜在滥用的担忧。尽管已经提出了许多检测方法来区分人工智能和人类生成的内容,但这些检测器的公平性和鲁棒性仍未得到充分探究。在本研究中,我们评估了几种广泛使用的GPT检测器在使用来自英语母语和非英语母语的写作样本时的性能。我们的发现表明,这些检测器将非英语母语的写作样本一致地错误地分类为人工智能生成的,而英语母语的写作样本则被准确地识别。此外,我们证明简单的提示策略不仅可以缓解这种偏见,还可以有效地绕过GPT检测器,这表明GPT检测器可能无意中惩罚具有受限语言表达的作者。我们的研究结果呼吁对部署ChatGPT内容检测器的伦理问题展开更广泛的讨论,并警告说当这些检测器在评估或教育环境中使用时,特别是在它们可能无意中惩罚或排除非英语母语者的全球话语中使用时,必须谨慎。