Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as table-based question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compared to generic text solutions and obscure which elements are truly beneficial. In this work, we focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval, or can a simpler text-based model be effectively used to achieve a similar result?" First, we perform an analysis on a table-based portion of the Natural Questions dataset (NQ-table), and find that structure plays a negligible role in more than 70% of the cases. Based on this, we experiment with a general Dense Passage Retriever (DPR) based on text and a specialized Dense Table Retriever (DTR) that uses table-specific model designs. We find that DPR performs well without any table-specific design and training, and even achieves superior results compared to DTR when fine-tuned on properly linearized tables. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. However, none of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.
翻译:表格是人类和机器读者的结构性数据的重要形式,为表格检索提供了重要形式,为无法或无法在文本中轻易找到的问题提供了答案。最近的工作为表格问题解答和表格检索等与表格有关的任务设计了特殊的模型和培训范例。虽然效果有效,但它们在建模和数据获取方面增加了复杂性,与通用文本解决方案相比增加了复杂性,并模糊了哪些要素真正有益。在这项工作中,我们侧重于表格检索任务,并询问:“表格检索需要的是表格专用模型设计,还是能够有效地使用一个简单的文本模型来实现类似结果?”首先,我们对“自然问题数据集(NQ-表格)”中基于表格的部分进行了分析,发现该结构在超过70%的案例中,在建模和数据获取方面,结构作用微不足道。在此基础上,我们试验一个基于文本和专门化的Dense Passage Retriever(DPR) (DTR) (DTR) (DTR) (DTR) (使用表格特定模型设计)。我们发现DPR(DPR) 运行得良好,没有具体表格的设计和培训,甚至实现了与具体格式化表格的更高结果,在微调的表格/制式表格中,因此,我们建议这些表格的硬化表格与正版的模版的模版结构与正版相比, 的粉。