A growing body of research has demonstrated the inability of NLP models to generalize compositionally and has tried to alleviate it through specialized architectures, training schemes, and data augmentation, among other approaches. In this work, we study a different approach: training on instances with diverse structures. We propose a model-agnostic algorithm for subsampling such sets of instances from a labeled instance pool with structured outputs. Evaluating on both compositional template splits and traditional IID splits of 5 semantic parsing datasets of varying complexity, we show that structurally diverse training using our algorithm leads to comparable or better generalization than prior algorithms in 9 out of 10 dataset-split type pairs. In general, we find structural diversity to consistently improve sample efficiency compared to random train sets. Moreover, we show that structurally diverse sampling yields comprehensive test sets that are a lot more challenging than IID test sets. Finally, we provide two explanations for improved generalization from diverse train sets: 1) improved coverage of output substructures, and 2) a reduction in spurious correlations between these substructures.
翻译:越来越多的研究显示,国家实验室方案模型无法全面概括组成,并试图通过专门结构、培训计划和数据增强等方法来缓解这种差异。在这项工作中,我们研究一种不同的方法:对不同结构的事例进行培训。我们建议用一种模型-不可知的算法,从一个有结构化产出的标签实例库中对此类实例进行子抽样抽样。对5个复杂程度不同的语义解析数据集的合成模板分裂和传统国际开发分类进行了评估,我们发现,使用我们的算法进行的结构多样性培训,在10个数据集型配对中的9个中,与先前的算法相比,导致可比或更好的概括化。一般来说,我们发现结构多样性是为了不断提高抽样效率,而不是随机列车组。此外,我们表明,结构多样性的抽样综合测试组比ID测试组更具挑战性。 最后,我们为改进不同火车组的概括化提供了两个解释:(1)产出子结构的覆盖面,以及(2)这些子结构之间令人毛骨悚的关联性。