In open domain table-to-text generation, we notice that the unfaithful generation usually contains hallucinated content which can not be aligned to any input table record. We thus try to evaluate the generation faithfulness with two entity-centric metrics: table record coverage and the ratio of hallucinated entities in text, both of which are shown to have strong agreement with human judgements. Then based on these metrics, we quantitatively analyze the correlation between training data quality and generation fidelity which indicates the potential usage of entity information in faithful generation. Motivated by these findings, we propose two methods for faithful generation: 1) augmented training by incorporating the auxiliary entity information, including both an augmented plan-based model and an unsupervised model and 2) training instance selection based on faithfulness ranking. We show these approaches improve generation fidelity in both full dataset setting and few shot learning settings by both automatic and human evaluations.
翻译:在公开的表格-文本生成中,我们注意到,不忠的一代人通常含有无法与任何输入表格记录相一致的幻觉内容。因此,我们试图用两种以实体为中心的衡量标准来评价生成的忠诚度:表格记录覆盖率和文本中被致幻剂实体的比例,两者都显示与人类判断有强烈的一致。然后,根据这些衡量标准,我们从数量上分析培训数据质量和生成忠诚度之间的相互关系,表明实体信息在忠实生成中的潜在用途。根据这些发现,我们提出了两种忠实生成方法:1)通过纳入辅助实体信息,包括基于计划的强化模型和不受监督的模式,加强培训,以及2)基于忠诚排序的培训实例选择。我们通过自动和人文评估,在完整的数据集设置和很少的直观学习环境中,这些方法都提高了生成的忠诚度。