Although large language models (LLMs) have shown exceptional performance in various natural language processing tasks, they are prone to hallucinations. State-of-the-art chatbots, such as the new Bing, attempt to mitigate this issue by gathering information directly from the internet to ground their answers. In this setting, the capacity to distinguish trustworthy sources is critical for providing appropriate accuracy contexts to users. Here we assess whether ChatGPT, a prominent LLM, can evaluate the credibility of news outlets. With appropriate instructions, ChatGPT can provide ratings for a diverse set of news outlets, including those in non-English languages and satirical sources, along with contextual explanations. Our results show that these ratings correlate with those from human experts (Spearmam's $\rho=0.54, p<0.001$). These findings suggest that LLMs could be an affordable reference for credibility ratings in fact-checking applications. Future LLMs should enhance their alignment with human expert judgments of source credibility to improve information accuracy.
翻译:Translated Abstract:
尽管大型语言模型(LLM)在各种自然语言处理任务中表现出了卓越的性能,但它们容易出现幻觉。最新的聊天机器人,如新版Bing,尝试通过直接从互联网收集信息来确定其答案。在这种情况下,区分值得信赖的来源对于向用户提供合适的准确性情境至关重要。在这里,我们评估了ChatGPT,一个著名的LLM,能否评估新闻媒体的可信度。通过适当的说明,ChatGPT可以为各种新闻媒体提供评级,包括非英语语言和讽刺性来源以及上下文解释。我们的结果表明,这些评级与人类专家的评级相关($Spearmam's \ rho =0.54,p <0.001$)。这些发现表明LLM可以成为事实核查应用程序中可信度评级的经济参考。未来的LLM应增强与人类专家判断来源可信度的协调性,以提高信息准确性。