Although sequence-to-sequence models often achieve good performance in semantic parsing for i.i.d. data, their performance is still inferior in compositional generalization. Several data augmentation methods have been proposed to alleviate this problem. However, prior work only leveraged superficial grammar or rules for data augmentation, which resulted in limited improvement. We propose to use subtree substitution for compositional data augmentation, where we consider subtrees with similar semantic functions as exchangeable. Our experiments showed that such augmented data led to significantly better performance on SCAN and GeoQuery, and reached new SOTA on compositional split of GeoQuery.
翻译:虽然序列到序列模型在对i.d.数据进行语义解析方面往往表现良好,但其性能在组成概括方面仍然较差。为缓解这一问题,提出了若干数据增强方法。然而,先前的工作只利用表面语法或数据增强规则,导致有限的改进。我们提议使用亚树替代合成数据增强,我们认为具有类似语义功能的亚树可以互换。我们的实验表明,这种扩大的数据使SCAN和GeoQuery的性能大大改善,并实现了关于GeoQuery的构成分法的新SOTA。