Fact verification (FV) is a challenging task which aims to verify a claim using multiple evidential sentences from trustworthy corpora, e.g., Wikipedia. Most existing approaches follow a three-step pipeline framework, including document retrieval, sentence retrieval and claim verification. High-quality evidences provided by the first two steps are the foundation of the effective reasoning in the last step. Despite being important, high-quality evidences are rarely studied by existing works for FV, which often adopt the off-the-shelf models to retrieve relevant documents and sentences in an "index-retrieve-then-rank" fashion. This classical approach has clear drawbacks as follows: i) a large document index as well as a complicated search process is required, leading to considerable memory and computational overhead; ii) independent scoring paradigms fail to capture the interactions among documents and sentences in ranking; iii) a fixed number of sentences are selected to form the final evidence set. In this work, we propose GERE, the first system that retrieves evidences in a generative fashion, i.e., generating the document titles as well as evidence sentence identifiers. This enables us to mitigate the aforementioned technical issues since: i) the memory and computational cost is greatly reduced because the document index is eliminated and the heavy ranking process is replaced by a light generative process; ii) the dependency between documents and that between sentences could be captured via sequential generation process; iii) the generative formulation allows us to dynamically select a precise set of relevant evidences for each claim. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines, with both time-efficiency and memory-efficiency.
翻译:事实核实(FV)是一项具有挑战性的任务,目的是用可靠的公司(如Wikipedia)的多处证据判决核实一项索赔,例如,Wikipedia。大多数现有办法都遵循三步管道框架,包括文件检索、刑罚检索和索赔核实。前两步提供的高质量证据是最后一步有效推理的基础。尽管重要,但FV的现有工作很少研究高质量的证据,这些现有工作往往采用现成模型,以“指数检索”的方式检索相关文件和判决。这种古典方法有以下明显的缺陷:一是大型文件索引以及复杂的搜索过程,导致大量记忆和计算间接费用;二是独立的评分模式,未能反映文件和判决之间在最后一步的相互作用;三是选择固定数目的句子作为最后证据。在这项工作中,我们建议GERE,第一个系统可以以“指数检索证据改进”的方式,即生成文件标题和证据,作为相关的选择一个复杂的查找过程;三是大型文件索引,这使我们的排序过程能够大大降低历史记录;二是,通过不断测测算的顺序,从而可以减少技术测算。