大语言模型协同检测与缓解幻觉 (Teaming LLMs to Detect and Mitigate Hallucinations)

Recent work has demonstrated state-of-the-art results in large language model (LLM) hallucination detection and mitigation through consistency-based approaches which involve aggregating multiple responses sampled from a single LLM for a given prompt. These approaches help offset limitations stemming from the imperfect data on which LLMs are trained, which includes biases and under-representation of information required at deployment time among other limitations which can lead to hallucinations. We show that extending these single-model consistency methods to combine responses from multiple LLMs with different training data, training schemes and model architectures can result in substantial further improvements in hallucination detection and mitigation capabilities beyond their single-model consistency counterparts. We evaluate this "consortium consistency" approach across many model teams from a pool of 15 LLMs and explore under what conditions it is beneficial to team together different LLMs in this manner. Further, we show that these performance improvements often come with reduced inference costs, offsetting a significant drawback with single-model consistency methods.

翻译：近期研究表明，通过基于一致性的方法——即针对给定提示聚合从单一LLM采样的多个响应——可在大型语言模型幻觉检测与缓解方面取得最先进的成果。这些方法有助于抵消LLM训练数据固有缺陷带来的限制，包括训练数据中存在的偏见、部署所需信息表征不足等可能导致幻觉的缺陷。我们证明，将此类单模型一致性方法扩展至整合来自多个具有不同训练数据、训练方案和模型架构的LLM的响应，能够在单模型一致性方法的基础上，显著提升幻觉检测与缓解能力。我们在由15个LLM组成的模型池中评估了这种"联盟一致性"方法，并探究了在何种条件下以此方式协同不同LLM能够产生效益。此外，我们发现这些性能提升往往伴随着推理成本的降低，从而弥补了单模型一致性方法的重要缺陷。