Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this problem in two different variants, based on how the information need is expressed: as a keyword query or as an existing table ("query-by-table"). The main novel contribution of this work is a semantic table retrieval framework for matching information needs (keyword or table queries) against tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using two purpose-built test collections based on Wikipedia tables, we demonstrate significant and substantial improvements over state-of-the-art baselines.
翻译:网上的表格以结构化的形式包含大量知识。为了利用这一宝贵的资源,我们解决了表格检索问题:用列表排名列表回答信息需求;我们根据信息需求如何表达,用两种不同的变量来调查这一问题:作为关键词查询,或作为现有表格(“逐个表格”)。这项工作的主要新贡献是用语义表格检索框架,将信息需求(关键词或表格查询)与表格匹配。具体地说,我们(一)代表多个语义空间的查询和表格(离散和连续密度矢量表示),以及(二)为匹配这些语义表达提出各种类似措施。我们考虑所有可能的语义表达和类似措施的组合,并在监督的学习模式中将这些组合用作特征。我们使用维基百科表格上的两个目的设计的测试集,展示了与最新基线相比的重大和重大改进。