Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives. In our work, we utilize the oLMpics benchmark and psycholinguistic probing datasets for a diverse set of 29 models including T5, BART, and ALBERT. Additionally, we adapt the oLMpics zero-shot setup for autoregressive models and evaluate GPT networks of different sizes. Our findings show that none of these models can resolve compositional questions in a zero-shot fashion, suggesting that this skill is not learnable using existing pre-training objectives. Furthermore, we find that global model decisions such as architecture, directionality, size of the dataset, and pre-training objective are not predictive of a model's linguistic capabilities.
翻译:现有的培训前变压器分析工作通常只注重一两个模型家庭,一次忽略结构的变异性和培训前目标。 在我们的工作中,我们使用奥卢比斯基准和心理语言测试数据集,以建立29个模型,包括T5、BART和ALBERT。此外,我们调整奥卢比斯零弹射设置,以适应自动递减模型,并评价不同大小的GPT网络。我们的研究结果显示,这些模型没有一个能够以零射方式解决组成问题,这表明使用现有的培训前目标是无法学习这种技能的。此外,我们发现,建筑、方向性、数据集大小和训练前目标等全球模型决定并不预测模型的语言能力。