Numerical tables are widely used to present experimental results in scientific papers. For table understanding, a metric-type is essential to discriminate numbers in the tables. We introduce a new information extraction task, metric-type identification from multi-level header numerical tables, and provide a dataset extracted from scientific papers consisting of header tables, captions, and metric-types. We then propose two joint-learning neural classification and generation schemes featuring pointer-generator-based and BERT-based models. Our results show that the joint models can handle both in-header and out-of-header metric-type identification problems.
翻译:数字表格被广泛用于在科学论文中介绍实验结果。为了理解表格,对表格中的数字进行区分至关重要。我们引入了新的信息提取任务,即从多级页头数字表格中进行类型识别,并提供从科学论文中提取的数据集,包括页头表格、标题和计量类型。然后,我们提出两个联合学习神经分类和生成计划,以指针式和BERT型模型为主。我们的结果显示,联合模型既可以处理标题型,也可以处理标题型外的识别型号。