To quantitatively and intuitively explore the generalization ability of pre-trained language models (PLMs), we have designed several tasks of arithmetic and logical reasoning. We both analyse how well PLMs generalize when the test data is in the same distribution as the train data and when it is different, for the latter analysis, we have also designed a cross-distribution test set other than the in-distribution test set. We conduct experiments on one of the most advanced and publicly released generative PLM - BART. Our research finds that the PLMs can easily generalize when the distribution is the same, however, it is still difficult for them to generalize out of the distribution.
翻译:为了从数量上和直觉上探讨经过训练的语文模型(PLM)的通用能力,我们设计了数项计算和逻辑推理任务,我们共同分析在试验数据与火车数据分布相同时,PLMS一般化程度有多好,在试验数据与列车数据分布相同时,如果与列车数据分布不同,后者分析时,我们还设计了一个跨分布测试组,而不是分布测试组。我们在最先进和公开发行的基因M(PLM)-BART(BART)进行实验。我们的研究发现,如果分布相同,PLMS可以很容易地概括化。然而,它们仍然难以在分布中加以概括化。