学习通过自上任务更好地代表表格 (Learning Better Representation for Tables by Self-Supervised Tasks)

Table-to-text generation aims at automatically generating natural text to help people to conveniently obtain the important information in tables. Although neural models for table-to-text have achieved remarkable progress, some problems still overlooked. The first is that the values recorded in many tables are mostly numbers in practice. The existing approaches do not do special treatment for these, and still regard these as words in natural language text. Secondly, the target texts in training dataset may contain redundant information or facts do not exist in the input tables. These may give wrong supervision signals to some methods based on content selection and planning and auxiliary supervision. To solve these problems, we propose two self-supervised tasks, Number Ordering and Significance Ordering, to help to learn better table representation. The former works on the column dimension to help to incorporate the size property of numbers into table representation. The latter acts on row dimension and help to learn a significance-aware table representation. We test our methods on the widely used dataset ROTOWIRE which consists of NBA game statistic and related news. The experimental results demonstrate that the model trained together with these two self-supervised tasks can generate text that contains more salient and well-organized facts, even without modeling context selection and planning. And we achieve the state-of-the-art performance on automatic metrics.

翻译：表格对文本的生成旨在自动生成自然文本,帮助人们方便地获取表格中的重要信息。虽然表格对文本的神经模型取得了显著的进展,但有些问题仍然被忽视。首先,许多表格中记录的数值大多是实际中的数字。现有的方法并不对这些数值作特殊处理,而仍然将这些视为自然语言文本中的文字。第二,培训数据集中的目标文本可能含有多余的信息或事实,输入表格中可能不存在这种信息或事实。这可能给基于内容选择和规划以及辅助监督的某些方法带来错误的监督信号。为了解决这些问题,我们建议了两种自我监督的任务,即编号顺序和重要性命令,以帮助学习更好的表格代表形式。前一种关于列内容方面的工作有助于将数字的大小属性纳入表格代表形式。后一种做法是行尺寸,有助于了解有意义的表格代表形式。我们测试了我们广泛使用的数据集ROTOWIRE的方法,其中包括NBA游戏统计和相关新闻。实验结果表明,与这两个自我监督的任务一起培训的模型,可以帮助学习更好的表格代表形式,从而产生更显著的性能显示我们实现的自动选择的文本。