Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently. Specifically, compositional generalization, i.e., whether a model generalizes to new structures built of components observed during training, has sparked substantial interest. In this work, we investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization, as output programs are constructed from sub-components. We analyze a wide variety of models and propose multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization. We find that the following factors improve compositional generalization: (a) using contextual representations, such as ELMo and BERT, (b) informing the decoder what input tokens have previously been attended to, (c) training the decoder attention to agree with pre-computed token alignments, and (d) downsampling examples corresponding to frequent program templates. While we substantially reduce the gap between in-distribution and OOD generalization, performance on OOD compositions is still substantially lower.
翻译:最近人们非常注意将模型普遍用于分发数据(OOD),具体地说,组成一般化,即是否对培训期间所观察的组成部分所建新结构进行一般化的模型,引起了极大的兴趣。在这项工作中,我们调查了语义分解的构成一般化,这是组成一般化的天然试验台,因为产出程序是从次级组成部分中构建的。我们分析了各种各样的模型,并提议多处扩展到语义剖析器的注意模块,目的是改进组成一般化。我们发现以下因素改善了构成的概括化:(a) 使用环境表述,如ELMO和BERT, (b) 向解密者通报以前曾使用的投入符号,(c) 培训解密者注意同意预先计算过的标书对齐,以及(d) 与经常使用的程序模板相对应的缩写示例。虽然我们大大缩小了分发和OOOD一般化之间的差距,但OOOD构成的绩效仍然大大降低。