Natural language is characterized by compositionality: the meaning of a complex expression is constructed from the meanings of its constituent parts. To facilitate the evaluation of the compositional abilities of language processing architectures, we introduce COGS, a semantic parsing dataset based on a fragment of English. The evaluation portion of COGS contains multiple systematic gaps that can only be addressed by compositional generalization; these include new combinations of familiar syntactic structures, or new combinations of familiar words and familiar structures. In experiments with Transformers and LSTMs, we found that in-distribution accuracy on the COGS test set was near-perfect (96--99%), but generalization accuracy was substantially lower (16--35%) and showed high sensitivity to random seed ($\pm$6--8%). These findings indicate that contemporary standard NLP models are limited in their compositional generalization capacity, and position COGS as a good way to measure progress.
翻译:自然语言的特征是构成性:复杂的表达方式的含义是根据其组成部分的含义构建的。为了便于评价语言处理结构的构成能力,我们引入了基于英文片段的语义解析数据集COGS。COGS的评估部分包含多种系统差距,只能通过构成性概括来弥补;这些差距包括熟悉的合成结构的新组合,或熟悉的单词和熟悉结构的新组合。在与变异器和LSTMS的实验中,我们发现COGS测试组的分布精确度接近完美(96-99%),但一般化精确度低得多(16-35%),对随机种子表现出高度敏感($\pm6-8 % ) 。这些结论表明,当代标准NLP模型的构成性概括能力有限,并且将COGS作为衡量进展的良好方法。