Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.
翻译:国家实验室股最近取得的许多进展都归功于模型学习数据集的特有理论。我们开展了一项案例研究,对基于BERT的一系列建筑(适应器、暹罗变异器、HEX除偏差器)的NLI(从MNLI到对抗性构建的HANS数据集)的普及性进行了案例研究,并对数据进行了子取样,并增加了模型大小。我们报告了2个成功和3个失败的战略,都对基于变异器的模型如何学会概括化提供了深刻的洞察力。