Long-context question answering (QA) tasks require reasoning over a long document or multiple documents. Addressing these tasks often benefits from identifying a set of evidence spans (e.g., sentences), which provide supporting evidence for answering the question. In this work, we propose a novel method for equipping long-context QA models with an additional sequence-level objective for better identification of the supporting evidence. We achieve this via an additional contrastive supervision signal in finetuning, where the model is encouraged to explicitly discriminate supporting evidence sentences from negative ones by maximizing question-evidence similarity. The proposed additional loss exhibits consistent improvements on three different strong long-context transformer models, across two challenging question answering benchmarks -- HotpotQA and QAsper.
翻译:长文本回答问题(QA)任务需要长文件或多份文件的推理。 处理这些任务往往从确定一系列证据(例如,句子)中受益,这些证据为回答问题提供了支持性证据。在这项工作中,我们提出一种新的方法,为长文本QA模型配备一个额外的序列级目标,以更好地识别支持性证据。我们通过在微调中增加对比性监督信号来实现这一目标,鼓励模型通过最大限度地提高问题证据的相似性,明确区分支持性证据判决与否定性判决。提议的额外损失表明,三种不同的强效长文本变压器模型不断改进,跨越两个有挑战性的问题回答基准 -- -- HotpotQA和QAsper。