Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug
翻译:最近的数据集暴露了标准序列到序列模型中缺乏系统化的概括能力。 在这项工作中,我们分析了后继2seq模型的这种行为,并找出了两个促成因素:缺乏相互排他性偏差(即,已经绘制到目标序列的源序列不太可能被映射到其他目标序列),以及倾向于将整个示例混为一文,而不是将结构与内容分开。我们提出了两个分别解决这两个问题的办法:相互排他性培训,使模型无法在面对新颖的、无法通过非典型损失产生可见的几代人的例子时生成。最后,我们分析了改进的特性,作为其余的挑战,并提供了详细的AMA/BRM/AGUMA方法。我们综合了这两种技术,我们用标准序列到序列模型(LSTMs和变换模型)对两个广泛使用的构成性数据集(SCAN和COGS)进行了大量的经验改进。 最后,我们分析了改进的特性,作为其余的挑战,提供了详细的MAGUM/GUMA方法。