Tables are widely used in documents because of their compact and structured representation of information. In particular, in scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easily understandable by scholars. Since the layout of tables is highly variable, it would be useful to interpret their content and classify them into categories. This could be helpful to directly extract information from scientific papers, for instance comparing performance of some models given their paper result tables. In this work, we address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use. We evaluate our model on a subset of the Tab2Know dataset. Since it contains few examples manually annotated, we propose data augmentation techniques directly on the table graph structures. We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.
翻译:特别是科学论文中,表格可以总结新发现,总结实验结果,使研究具有可比性,便于学者理解。由于表格的布局变化很大,因此,对表格内容进行解释并将其分为不同类别是有益的。这可能有助于直接从科学文件中提取信息,例如比较一些模型在纸质结果表上的表现。在这项工作中,我们用图表神经网络处理表格分类问题,利用表格结构来计算信息传递算法的使用。我们评估了我们在Tab2Know数据集中的一个子集的模型。由于它用人工附加说明的方式列出了几个例子,我们建议直接在表格图表结构中采用数据增强技术。我们取得了有希望的初步结果,提出了适合以图表为基础的表格代表的数据增强方法。