The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for current deep learning models. Recent work shows that this limitation persists in state-of-the-art Transformer-based models. Most solutions to this problem use specific architectures or training methods that do not generalize to other tasks. We demonstrate that large language models can succeed in extrapolation without modifying their architecture or training procedure. Experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation. First, we induce it to produce step-by-step rationales before outputting the answer to effectively communicate the task to the model. However, as sequences become longer, we find that current models struggle to keep track of token positions. To address this issue, we interleave output tokens with markup tokens that act as explicit positional and counting symbols. Our findings show how these two complementary approaches enable remarkable sequence extrapolation and highlight a limitation of current architectures to effectively generalize without explicit surface form guidance. Code available at https://github.com/MirelleB/induced-rationales-markup-tokens
翻译:外推能力,即对长于培训实例的序列作出预测的能力,是当前深层学习模式的一个棘手问题。最近的工作显示,这种限制在最新变异器模型中持续存在。这个问题的大多数解决办法都使用特定的结构或培训方法,这些结构或培训方法并不概括其他任务。我们证明,大语言模型可以在不修改其结构或培训程序的情况下成功外推。实验结果显示,要进行有效外推,既需要产生逐步推理,也需要引入标记符号。首先,我们引导它产生逐步推理,然后才能输出答案,有效地向模型传达任务。然而,随着时间的变长,我们发现当前模型在努力跟踪象征性位置。为解决这一问题,我们将输出带有标记标记标记的内置标记,这些标记具有明确的定位和计数符号。我们的研究结果显示,这两种互补方法如何促成显著的序列外推,并突出当前结构在不明确表面格式指导的情况下有效普遍化的局限性。在 https://githtobal/Mcommargal-commaisal-maisal-maisal-mailis-Mirmaision/Mirmailmailational-maisal-mailation/Mirmamailation)