Generic unstructured neural networks have been shown to struggle on out-of-distribution compositional generalization. Compositional data augmentation via example recombination has transferred some prior knowledge about compositionality to such black-box neural models for several semantic parsing tasks, but this often required task-specific engineering or provided limited gains. We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL). CSL is a generative model with a quasi-synchronous context-free grammar backbone, which we induce from the training data. We sample recombined examples from CSL and add them to the fine-tuning data of a pre-trained sequence-to-sequence model (T5). This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks, and results in a model even stronger than a T5-CSL ensemble on two real world compositional generalization tasks. This results in new state-of-the-art performance for these challenging semantic parsing tasks requiring generalization to both natural language variation and novel compositions of elements.
翻译:常规非结构的神经网络被显示为在分布性构件一般化上挣扎。 通过样例重组而增强的构成性数据已经将一些先前关于组成性的知识转移到了用于若干语义分析任务的黑盒神经模型中,但这往往需要针对具体任务的工程或提供有限的收益。我们提出了一个更强大的数据重组方法,使用一个名为“构件结构学习者”的模型(CSL)的模型。CSL是一个基因化模型,具有一种准同步的无上下文语法主干网,我们从培训数据中引出。我们从 CSL 中提取了一些关于组成性的一些实例,并将其添加到预先训练的序列至序列模型(T5)的微调数据中。这个程序有效地将CSL大部分的构成偏差转移到T5,用于诊断任务,并导致一个比T5-CSL在两种真实世界的构件总化任务上更强大的模型。这导致这些具有挑战性的语义化任务的新状态性表现,需要将普通化到自然语言变异和新元素的构成。