Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning. In this work, we use centered kernel alignment (CKA), a method for comparing learned representations, to measure the similarity of representations in task-tuned models across layers. In experiments across twelve NLU tasks, we discover a consistent block diagonal structure in the similarity of representations within fine-tuned RoBERTa and ALBERT models, with strong similarity within clusters of earlier and later layers, but not between them. The similarity of later layer representations implies that later layers only marginally contribute to task performance, and we verify in experiments that the top few layers of fine-tuned Transformers can be discarded without hurting performance, even with no further tuning.
翻译:尽管为下游自然语言理解(NLU)任务对诸如BERT等经过预先训练的语言编码员进行了微调,取得了成功,但人们仍然不太了解神经网络在微调后是如何变化的。在这项工作中,我们使用中央内核对齐(CKA)这个比较有学识的表述方法,以测量任务调整模型各层次的相似性。在12个新LU任务的实验中,我们发现在经过微调的RoBERTA和ALBERT模型中,在相似的表述结构中有一个一致的区块对角结构,在前层和后层的组合中非常相似,但在后层的表达方式之间则不同。后层的相似性意味着后层只能对任务绩效略作贡献,我们在实验中核实,即使没有进一步的调整,少数经过微调整的顶层变形器也可以在不伤害性能的情况下被丢弃。