Recent advances in open-domain QA have led to strong models based on dense retrieval, but only focused on retrieving textual passages. In this work, we tackle open-domain QA over tables for the first time, and show that retrieval can be improved by a retriever designed to handle tabular context. We present an effective pre-training procedure for our retriever and improve retrieval quality with mined hard negatives. As relevant datasets are missing, we extract a subset of Natural Questions (Kwiatkowski et al., 2019) into a Table QA dataset. We find that our retriever improves retrieval results from 72.0 to 81.1 recall@10 and end-to-end QA results from 33.8 to 37.7 exact match, over a BERT based retriever.
翻译:开放域 QA 的最新进展导致基于密集检索的强大模型,但只侧重于检索文本段落。 在这项工作中,我们第一次在表格上解决开放域 QA 问题,并表明检索可以通过一个旨在处理表格背景的检索器加以改进。我们为我们的检索器提出了一个有效的培训前程序,用埋有硬底片的检索器提高检索质量。随着相关数据集的缺失,我们将一系列自然问题(Kwiatkowski等人,2019年)提取到一个表格 QA 数据集中。我们发现,我们的检索器改进了检索结果,从72.0到81.1回顾@10和端至端QA的结果,从33.8到37.7的精确匹配结果,超过了一个基于 BERT 的检索器。