Exposure bias has been regarded as a central problem for auto-regressive language models (LM). It claims that teacher forcing would cause the test-time generation to be incrementally distorted due to the training-generation discrepancy. Although a lot of algorithms have been proposed to avoid teacher forcing and therefore alleviate exposure bias, there is little work showing how serious the exposure bias problem actually is. In this work, we focus on the task of open-ended language generation, propose metrics to quantify the impact of exposure bias in the aspects of quality, diversity, and consistency. Our key intuition is that if we feed ground-truth data prefixes (instead of prefixes generated by the model itself) into the model and ask it to continue the generation, the performance should become much better because the training-generation discrepancy in the prefix is removed. Both automatic and human evaluations are conducted in our experiments. On the contrary to the popular belief in exposure bias, we find that the the distortion induced by the prefix discrepancy is limited, and does not seem to be incremental during the generation. Moreover, our analysis reveals an interesting self-recovery ability of the LM, which we hypothesize to be countering the harmful effects from exposure bias.
翻译:接触偏差被认为是自动递减语言模型(LM)的一个中心问题。 它声称,教师强迫会使测试时间的生成由于培训产生的差异而逐渐扭曲。 虽然提出了许多算法以避免教师强迫,从而减轻接触偏差,但几乎没有什么工作表明接触偏差问题实际上有多严重。 在这项工作中,我们把重点放在开放语言生成的任务上,提出量化暴露偏差在质量、多样性和一致性方面的影响的衡量标准。我们的关键直觉是,如果我们向模型提供地面真相数据前缀(而不是模型本身产生的前缀),并要求它继续生成,那么由于消除了前缀中的培训产生的差异,其性能应该好得多。在我们的实验中进行自动和人文评价。与公众对于接触偏差的信念相反,我们发现前缀差异引起的扭曲是有限的,而且一代中似乎没有递增。 此外,我们的分析显示,从抵抗LM的有害风险到我们模拟的自我恢复能力。