Long-form numerical reasoning in financial analysis aims to generate a reasoning program to calculate the correct answer for a given question. Previous work followed a retriever-generator framework, where the retriever selects key facts from a long-form document, and the generator generates a reasoning program based on retrieved facts. However, they treated all facts equally without considering the different contributions of facts with and without numbers. Meanwhile, the program consistency were ignored under supervised training, resulting in lower training accuracy and diversity. To solve these problems, we proposed APOLLO to improve the long-form numerical reasoning framework. For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts. For the generator, we design consistency-based reinforcement learning and target program augmentation strategy based on the consistency of program execution results. Experimental results on the FinQA and ConvFinQA leaderboard verify the effectiveness of our proposed method, achieving the new state-of-the-art.
翻译:财务分析中的长式数字推理旨在生成一个推理程序来计算一个特定问题的正确答案。 先前的工作遵循一个检索器生成器框架, 检索器从长式文档中选择关键事实, 生成器根据已检索的事实生成一个推理程序。 但是, 它们对所有事实一视同仁, 不考虑数字和数字对事实的不同贡献。 同时, 在监督培训下, 程序的一致性被忽略, 导致培训的准确性和多样性降低 。 为了解决这些问题, 我们建议 APOLLO 改进长式数字推理框架 。 对于检索器, 我们采用了一个数字认知的负面抽样战略, 使检索器能够对关键数字事实更加具有歧视性 。 对于生成器, 我们设计基于一致性的强化学习和目标强化方案强化战略, 以程序执行结果的一致性为基础。 FinQA 和 ConvFin QA 领导板的实验结果验证了我们拟议方法的有效性, 实现新的状态 。