This research project addresses the errors of financial numerical reasoning Question Answering (QA) tasks due to the lack of domain knowledge in finance. Despite recent advances in Large Language Models (LLMs), financial numerical questions remain challenging because they require specific domain knowledge in finance and complex multi-step numeric reasoning. We implement a multi-retriever Retrieval Augmented Generators (RAG) system to retrieve both external domain knowledge and internal question contexts, and utilize the latest LLM to tackle these tasks. Through comprehensive ablation experiments and error analysis, we find that domain-specific training with the SecBERT encoder significantly contributes to our best neural symbolic model surpassing the FinQA paper's top model, which serves as our baseline. This suggests the potential superior performance of domain-specific training. Furthermore, our best prompt-based LLM generator achieves the state-of-the-art (SOTA) performance with significant improvement (>7%), yet it is still below the human expert performance. This study highlights the trade-off between hallucinations loss and external knowledge gains in smaller models and few-shot examples. For larger models, the gains from external facts typically outweigh the hallucination loss. Finally, our findings confirm the enhanced numerical reasoning capabilities of the latest LLM, optimized for few-shot learning.
翻译:本研究针对金融数值推理问答任务因缺乏金融领域知识而产生的误差问题展开。尽管大语言模型近期取得进展,金融数值问题仍具挑战性,因其需要特定的金融领域知识及复杂的多步骤数值推理。我们实现了一个多检索器的检索增强生成系统,用于检索外部领域知识与内部问题语境,并利用最新大语言模型处理这些任务。通过全面的消融实验与误差分析,我们发现采用SecBERT编码器的领域特定训练显著提升了最佳神经符号模型的性能,使其超越作为基线的FinQA论文顶级模型。这表明领域特定训练可能具有更优性能潜力。此外,我们基于提示的最佳大语言模型生成器实现了显著改进(>7%)的当前最优性能,但仍低于人类专家水平。本研究揭示了较小模型与少样本示例中幻觉损失与外部知识增益之间的权衡关系。对于较大模型,外部事实带来的增益通常超过幻觉损失。最后,我们的研究结果证实了针对少样本学习优化的最新大语言模型在数值推理能力上的增强。