ChatGPT中的毒性：分析被赋予特定人设的语言模型 (Toxicity in ChatGPT: Analyzing Persona-assigned Language Models)

Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a clear understanding of the capabilities and limitations of LLMs is necessary. To this end, we systematically evaluate toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM. We find that setting the system parameter of ChatGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to ChatGPT, its toxicity can increase up to 6x, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. This may be potentially defamatory to the persona and harmful to an unsuspecting user. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3x more) irrespective of the assigned persona, that reflect inherent discriminatory biases in the model. We hope that our findings inspire the broader AI community to rethink the efficacy of current safety guardrails and develop better techniques that lead to robust, safe, and trustworthy AI systems.

翻译：大型语言模型（LLMs）展现出惊人的能力并超越了自然语言处理（NLP）社区，被广泛应用于医疗、治疗、教育和客户服务等多个领域。由于使用这些系统的用户包括具有重要信息需求的人，如与聊天机器人交互的学生或患者，因此这些系统的安全性至关重要。因此，必须清楚了解LLMs的能力和局限性。为此，我们系统评估了ChatGPT（一种流行的基于对话的LLM）中超过500,000个生成的毒性。我们发现，通过为ChatGPT分配特定人设（如拳击手穆罕默德·阿里），可以显著增加生成的毒性。根据分配给ChatGPT的人设，其毒性可能增加多达6倍，其输出参与不正确的刻板印象、有害的对话和伤害性的观点。这可能会对人设造成潜在诽谤，并对不知情的用户造成伤害。此外，我们发现令人担忧的模式，即特定实体（例如某些种族）被针对的比其他实体多（高达3倍），独立于所分配的人设，反映了该模型内在的歧视偏见。我们希望我们的发现能激励更广泛的AI社区重新思考当前安全防护栏的效力，并开发更好的技术，从而实现强大、安全和值得信赖的AI系统。

相关内容

ChatGPT

关注 257

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/