Sequence-to-sequence models excel at handling natural language variation, but have been shown to struggle with out-of-distribution compositional generalization. This has motivated new specialized architectures with stronger compositional biases, but most of these approaches have only been evaluated on synthetically-generated datasets, which are not representative of natural language variation. In this work we ask: can we develop a semantic parsing approach that handles both natural language variation and compositional generalization? To better assess this capability, we propose new train and test splits of non-synthetic datasets. We demonstrate that strong existing approaches do not perform well across a broad set of evaluations. We also propose NQG-T5, a hybrid model that combines a high-precision grammar-based approach with a pre-trained sequence-to-sequence model. It outperforms existing approaches across several compositional generalization challenges on non-synthetic data, while also being competitive with the state-of-the-art on standard evaluations. While still far from solving this problem, our study highlights the importance of diverse evaluations and the open challenge of handling both compositional generalization and natural language variation in semantic parsing.
翻译:在这项工作中,我们问:我们能否发展一种处理自然语言变异和组成通用的语义分解方法?为了更好地评估这种能力,我们提议新的列车和测试非合成数据集的分解方法。我们证明,强有力的现有方法在广泛的评价中效果不佳。我们还提议NQG-T5, 一种混合模型,将高精度的语法方法与预先训练的顺序到顺序模式相结合。它超越了现有方法,在非合成数据上,它克服了多种构成上的一般化挑战,同时与标准评价方面最先进的方法竞争。我们的研究虽然远远没有解决这个问题,但突出了多样性评估的重要性和处理一般语言的公开挑战。