Compositionality -- the ability to combine familiar units like words into novel phrases and sentences -- has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behavior of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser's lower performance on the most challenging splits.
翻译:构成性 -- -- 将文字等熟悉单位合并成新句和句子的能力 -- -- 近些年来,人工智能一直非常关注人工智能的高度关注焦点。为了测试语义剖析中的构成性概观,Keysers等人(2020年)引入了“自定义自由基础查询”(CFQ) 。该数据集将原始单元(如文字)的测试和训练分布的相似性最大化到原始单元(如文字),同时将复合差异最大化:测试和训练分布在较大结构(如语句)上的差异性能差异性能差异最大。但依赖性差异性差异性差异缺乏一个构成性一般化基准。在这项工作中,我们引入了一套CFCQ(CQ) 的金标准依赖性差异分析,并用它来分析CFSQQ(Qi等人,2020年) 数据集中的艺术依赖性分析者的行为。我们发现,复合性差异性差异性差异性能减少,尽管与语义性差异性差异性差异性能相比,我们发现依赖性分析师的性差异性差异性能的性能与复合差异性差异性能差异性结构不同。我们发现,对差异性差异性差异性差异性差异性差异性结构进行了不同。