表格检索值 5 不可断断断断断的表格特定表格模型设计 (Table Retrieval May Not Necessitate Table-specific Model Design)

Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as table-based question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compared to generic text solutions and obscure which elements are truly beneficial. In this work, we focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval, or can a simpler text-based model be effectively used to achieve a similar result?" First, we perform an analysis on a table-based portion of the Natural Questions dataset (NQ-table), and find that structure plays a negligible role in more than 70% of the cases. Based on this, we experiment with a general Dense Passage Retriever (DPR) based on text and a specialized Dense Table Retriever (DTR) that uses table-specific model designs. We find that DPR performs well without any table-specific design and training, and even achieves superior results compared to DTR when fine-tuned on properly linearized tables. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. However, none of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.

翻译：表格是人类和机器读者的结构性数据的重要形式,为表格检索提供了重要形式,为无法或无法在文本中轻易找到的问题提供了答案。最近的工作为表格问题解答和表格检索等与表格有关的任务设计了特殊的模型和培训范例。虽然效果有效,但它们在建模和数据获取方面增加了复杂性,与通用文本解决方案相比增加了复杂性,并模糊了哪些要素真正有益。在这项工作中,我们侧重于表格检索任务,并询问:“表格检索需要的是表格专用模型设计,还是能够有效地使用一个简单的文本模型来实现类似结果?”首先,我们对“自然问题数据集(NQ-表格)”中基于表格的部分进行了分析,发现该结构在超过70%的案例中,在建模和数据获取方面,结构作用微不足道。在此基础上,我们试验一个基于文本和专门化的Dense Passage Retriever(DPR) (DTR) (DTR) (DTR) (DTR) (使用表格特定模型设计)。我们发现DPR(DPR) 运行得良好,没有具体表格的设计和培训,甚至实现了与具体格式化表格的更高结果,在微调的表格/制式表格中,因此,我们建议这些表格的硬化表格与正版的模版的模版结构与正版相比, 的粉。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日