ChatGPT在社交媒体上检测和区分仇恨、冒犯和有毒评论的潜力 ("HOT" ChatGPT: The promise of ChatGPT in detecting and discriminating hateful, offensive, and toxic comments on social media)

Harmful content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to address this issue is to develop detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful content. To investigate this potential, we used ChatGPT and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful content: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically, the model displays a more consistent classification for non-HOT comments than HOT comments compared to human annotations. Our findings also suggest that ChatGPT classifications align with provided HOT definitions, but ChatGPT classifies "hateful" and "offensive" as subsets of "toxic." Moreover, the choice of prompts used to interact with ChatGPT impacts its performance. Based on these in-sights, our study provides several meaningful implications for employing ChatGPT to detect HOT content, particularly regarding the reliability and consistency of its performance, its understand-ing and reasoning of the HOT concept, and the impact of prompts on its performance. Overall, our study provides guidance about the potential of using generative AI models to moderate large volumes of user-generated content on social media.

翻译：有害内容在社交媒体上普遍存在，污染在线社区并对参与产生负面影响。解决这个问题的一种常见方法是开发依赖于人类标注的检测模型。然而，构建这种模型所需的任务会使标注者接触到有害和令人反感的内容，并且可能需要大量的时间和成本来完成。生成型AI模型有潜力理解和检测有害内容。为了研究这种潜力，我们使用了ChatGPT，并与MTurker注释进行了比较，在与有害内容相关的三个经常讨论的概念：仇恨、冒犯和有毒(HOT）方面比较了它的性能。我们设计了五个提示与ChatGPT进行交互，并进行了四个实验，引出了HOT分类。我们的结果显示，与MTurker注释相比，ChatGPT可以达到近80%的准确率。具体而言，与人类注释相比，该模型对于非HOT评论的分类更具一致性。我们的研究还表明，ChatGPT的分类与提供的HOT定义相一致，但将“仇恨”和“冒犯”分类为“有毒”的子集。此外，用于与ChatGPT交互的提示选择会影响其性能。基于这些观察结果，我们的研究提供了几个关于使用ChatGPT检测HOT内容的有意义的启示，特别是关于其性能的可靠性和一致性、其理解和推理HOT概念的能力以及提示对其性能的影响。总的来说，我们的研究提供了关于使用生成型AI模型在社交媒体上控制大量用户生成内容的潜力的指导。

相关内容

ChatGPT

关注 0

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

从ChatGPT看AI未来趋势和挑战 | 万字长文

专知会员服务

174+阅读 · 2023年4月18日

ChatGPT引领AIGC！Lehigh最新《AI生成内容(AIGC)》全面综述，44页pdf详述GAN到ChatGPT发展历程

专知会员服务

171+阅读 · 2023年3月14日

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日