Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.
翻译:接受过教师强制培训的序列生成模型存在与暴露偏向和不同时间步之间缺乏差异有关的问题。我们提出的方法,即教师用Ngram(TeaFORN),直接解决了这两个问题,方法是使用一批经过训练的N解码器,按照二级时间轴进行解码,允许根据N预测步骤进行模型参数更新。TeaForN可以使用大量的解码器结构,并需要从标准的教师强制设置中进行最低限度的修改。我们经常显示,TeaForN在一个机器翻译基准(WMT 2014 英文-法文)和两个新闻总结基准(CNN/Dailymail和Gigaword)上提升了生产质量。