Pre-trained Transformer-based neural architectures have consistently achieved state-of-the-art performance in the Natural Language Inference (NLI) task. Since NLI examples encompass a variety of linguistic, logical, and reasoning phenomena, it remains unclear as to which specific concepts are learnt by the trained systems and where they can achieve strong generalization. To investigate this question, we propose a taxonomic hierarchy of categories that are relevant for the NLI task. We introduce TAXINLI, a new dataset, that has 10k examples from the MNLI dataset (Williams et al., 2018) with these taxonomic labels. Through various experiments on TAXINLI, we observe that whereas for certain taxonomic categories SOTA neural models have achieved near perfect accuracies - a large jump over the previous models - some categories still remain difficult. Our work adds to the growing body of literature that shows the gaps in the current NLI systems and datasets through a systematic presentation and analysis of reasoning categories.
翻译:受过培训的以变异器为基础的神经结构在自然语言推断(NLI)任务中一直取得最先进的性能。由于NLI实例包含各种语言、逻辑和推理现象,因此仍然不清楚受过训练的系统学习了哪些具体概念,以及这些概念能够在哪里得到有力的概括化。为了调查这一问题,我们建议了与国家语言推断任务相关的分类等级。我们引入了TAXINLI这一新数据集,该数据集有10千个来自MLI数据集(Williams等人,2018年)和这些分类标签的例子。我们观察到,通过对TAXINLI进行的各种实验,SOTA神经模型在某些分类类别中几乎达到了完美的理解性(大大高于以前的模型),但有些类别仍然很困难。我们的工作增加了越来越多的文献,通过系统介绍和分析推理类别来显示目前国家语言分类系统和数据集的差距。