High-quality Web tables are rich sources of information that can be used to populate Knowledge Graphs (KG). The focus of this paper is an evaluation of methods for table-to-class annotation, which is a sub-task of Table Interpretation (TI). We provide a formal definition for table classification as a machine learning task. We propose an experimental setup and we evaluate 5 fundamentally different approaches to find the best method for generating vector table representations. Our findings indicate that although transfer learning methods achieve high F1 score on the table classification task, dedicated table encoding models are a promising direction as they appear to capture richer semantics.
翻译:高质量的网络表格是丰富的信息来源,可用于填充知识图表(KG),本文件的重点是评估表格到分类的注释方法,这是对表格解释的子任务。我们为表格分类提供了一个正式的定义,作为机器学习任务。我们提出一个实验设置,并评估5个根本不同的方法,以找到产生矢量表格代表的最佳方法。我们的调查结果表明,尽管转移学习方法在表格分类任务中取得了高F1分,但专用表格编码模型是一个很有希望的方向,因为它们似乎能够捕捉到更丰富的语义。