Judgmental forecasting is the task of making predictions about future events based on human judgment. This task can be seen as a form of claim verification, where the claim corresponds to a future event and the task is to assess the plausibility of that event. In this paper, we propose a novel multi-agent framework for claim verification, whereby different agents may disagree on claim veracity and bring specific evidence for and against the claims, represented as quantitative bipolar argumentation frameworks (QBAFs). We then instantiate the framework for supporting claim verification, with a variety of agents realised with Large Language Models (LLMs): (1) ArgLLM agents, an existing approach for claim verification that generates and evaluates QBAFs; (2) RbAM agents, whereby LLM-empowered Relation-based Argument Mining (RbAM) from external sources is used to generate QBAFs; (3) RAG-ArgLLM agents, extending ArgLLM agents with a form of Retrieval-Augmented Generation (RAG) of arguments from external sources. Finally, we conduct experiments with two standard judgmental forecasting datasets, with instances of our framework with two or three agents, empowered by six different base LLMs. We observe that combining evidence from agents can improve forecasting accuracy, especially in the case of three agents, while providing an explainable combination of evidence for claim verification.
翻译:判断性预测是基于人类判断对未来事件进行预测的任务。该任务可视为一种主张验证形式,其中主张对应于未来事件,而任务则是评估该事件的可能性。本文提出了一种新颖的多智能体框架用于主张验证,不同智能体可能对主张真实性持有异议,并提供支持或反对该主张的具体证据,这些证据以定量双极论证框架(QBAFs)表示。随后,我们实例化了该框架以支持主张验证,通过多种基于大语言模型(LLMs)实现的智能体:(1)ArgLLM智能体,一种现有主张验证方法,可生成并评估QBAFs;(2)RbAM智能体,利用基于大语言模型的关系型论证挖掘(RbAM)从外部来源生成QBAFs;(3)RAG-ArgLLM智能体,通过检索增强生成(RAG)技术从外部来源获取论证,扩展了ArgLLM智能体。最后,我们在两个标准判断性预测数据集上进行了实验,采用包含两至三个智能体的框架实例,并依托六种不同基础大语言模型。实验表明,整合多智能体证据能提升预测准确性,尤其在三个智能体协同工作时效果显著,同时为主张验证提供了可解释的证据组合机制。