Language models suffer from various degenerate behaviors. These differ between tasks: machine translation (MT) exhibits length bias, while tasks like story generation exhibit excessive repetition. Recent work has attributed the difference to task constrainedness, but evidence for this claim has always involved many confounding variables. To study this question directly, we introduce a new experimental framework that allows us to smoothly vary task constrainedness, from MT at one end to fully open-ended generation at the other, while keeping all other aspects fixed. We find that: (1) repetition decreases smoothly with constrainedness, explaining the difference in repetition across tasks; (2) length bias surprisingly also decreases with constrainedness, suggesting some other cause for the difference in length bias; (3) across the board, these problems affect the mode, not the whole distribution; (4) the differences cannot be attributed to a change in the entropy of the distribution, since another method of changing the entropy, label smoothing, does not produce the same effect.
翻译:语言模型受到各种堕落行为的影响。这些不同任务有不同:机器翻译(MT)显示时间偏差,而故事生成等任务则表现出过度重复。最近的工作将差异归因于任务制约,但这一说法的证据总是涉及许多令人困惑的变量。为了直接研究这一问题,我们引入了新的实验框架,使我们能够顺利地改变任务制约,从一端的MT到另一端的完全开放的一代,同时保持所有其他方面的固定。我们发现:(1) 重复随着限制而平稳地减少,解释了任务重复的差异;(2) 长度偏差也令人惊讶地减少,表明时间偏差有其他原因;(3) 全面而言,这些问题影响模式,而不是整个分布;(4) 差异不能归因于分布的变形,因为另一种改变酶、标签平滑的方法不会产生同样的效果。