Recent advancements in Large Language Models, such as ChatGPT, have demonstrated significant potential to impact various aspects of human life. However, ChatGPT still faces challenges in aspects like faithfulness. Taking question answering as a representative application, we seek to understand why ChatGPT falls short in answering questions faithfully. To address this question, we attempt to analyze the failures of ChatGPT in complex open-domain question answering and identifies the abilities under the failures. Specifically, we categorize ChatGPT's failures into four types: comprehension, factualness, specificity, and inference. We further pinpoint three critical abilities associated with QA failures: knowledge memorization, knowledge association, and knowledge reasoning. Additionally, we conduct experiments centered on these abilities and propose potential approaches to enhance faithfulness. The results indicate that furnishing the model with fine-grained external knowledge, hints for knowledge association, and guidance for reasoning can empower the model to answer questions more faithfully.
翻译:大型语言模型的最新进展,如ChatGPT,在影响人类生活的各个方面显示了巨大的潜力。但是,ChatGPT在忠实回答问题等方面仍面临挑战。以问答系统为代表应用,我们试图了解ChatGPT为什么在回答问题方面存在不准确的问题。为了解决这个问题,我们试图分析ChatGPT在复杂的开放领域问答中的失败,识别失败背后的能力。具体而言,我们将ChatGPT的失败分类为四种类型:理解、事实性、特定性和推理。我们进一步指出了与QA失败相关的三个关键能力:知识记忆、知识关联和知识推理。此外,我们进行了以这些能力为中心的实验,并提出了增强准确性的潜在方法。结果表明,为模型提供细粒度的外部知识、知识关联的提示和推理指导可以使模型更忠实地回答问题。