Several studies have reported the inability of Transformer models to generalize compositionally, a key type of generalization in many NLP tasks such as semantic parsing. In this paper we explore the design space of Transformer models showing that the inductive biases given to the model by several design decisions significantly impact compositional generalization. Through this exploration, we identified Transformer configurations that generalize compositionally significantly better than previously reported in the literature in a diverse set of compositional tasks, and that achieve state-of-the-art results in a semantic parsing compositional generalization benchmark (COGS), and a string edit operation composition benchmark (PCFG).
翻译:一些研究报告说,变形模型无法对组成进行概括化,这是许多非产物分类等非产物分类任务中一种关键的一般化类型。在本文中,我们探索了变形模型的设计空间,表明若干设计决定给模型的暗示偏差显著地影响到组成上的概括化。 通过这一探索,我们确定了变形模型的配置,这些变形模型在多种组成任务中比文献中报告的范围要大得多,并在一个语义分类一般化基准(COGS)和一个字符串编辑操作构成基准(PCFG)中实现了最新的结果。