Compositional generalization is a troubling blind spot for neural language models. Recent efforts have presented techniques for improving a model's ability to encode novel combinations of known inputs, but less work has focused on generating novel combinations of known outputs. Here we focus on this latter "decode-side" form of generalization in the context of gSCAN, a synthetic benchmark for compositional generalization in grounded language understanding. We present Recursive Decoding (RD), a novel procedure for training and using seq2seq models, targeted towards decode-side generalization. Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time. Inputs (i.e., the external gSCAN environment) are then incrementally updated based on predicted tokens, and re-encoded for the next decoder time step. RD thus decomposes a complex, out-of-distribution sequence generation task into a series of incremental predictions that each resemble what the model has already seen during training. RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN. We provide analyses to elucidate these gains over failure of a baseline, and then discuss implications for generalization in naturalistic grounded language understanding, and seq2seq more generally.
翻译:对神经语言模型来说,总体的构成是一个令人不安的盲点。最近的努力为改进模型对已知投入的新组合进行编码的能力提供了技术,但较少的工作侧重于对已知产出进行新组合。这里我们侧重于GSCAN的后一种“解码侧”的概括形式,GSCAN是基础语言理解中整体化的合成基准。我们介绍了Recuris Decoding(RD),这是培训和使用后继2当量模型的一种新程序,目标是解码的概括化。模型不是在一次中生成整个输出序列,而是在一次中进行预测一个符号的培训。投入(即外部的GSCAN环境)随后根据预测的符号逐步更新,并为下一个解码时间步骤重新编码。因此,RD将一个复杂、超分量序列生成的任务分解成一系列渐进预测,每个模型都与培训期间所看到的相似。RD在GSCAN的两项先前被忽略的概括化任务上取得了巨大的改进。我们从总体上分析这些成果,然后在GSCAN进行关于一般基础和后继力分析。我们从广义上解释了这些成果。