Recent years have witnessed the prevalent application of pre-trained language models (PLMs) in NLP. From the perspective of parameter space, PLMs provide generic initialization, starting from which high-performance minima could be found. Although plenty of works have studied how to effectively and efficiently adapt PLMs to high-performance minima, little is known about the connection of various minima reached under different adaptation configurations. In this paper, we investigate the geometric connections of different minima through the lens of mode connectivity, which measures whether two minima can be connected with a low-loss path. We conduct empirical analyses to investigate three questions: (1) how could hyperparameters, specific tuning methods, and training data affect PLM's mode connectivity? (2) How does mode connectivity change during pre-training? (3) How does the PLM's task knowledge change along the path connecting two minima? In general, exploring the mode connectivity of PLMs conduces to understanding the geometric connection of different minima, which may help us fathom the inner workings of PLM downstream adaptation.
翻译:从参数空间的角度来看,PLMS提供通用初始化,从可以找到高性能微型数据开始,我们进行实证分析,以调查三个问题:(1)超参数、具体调试方法和培训数据如何影响PLM模式的连通性?(2)在培训前如何改变模式连通性?(3)PLM的任务知识如何沿连接两个小型模型的路径变化?(3)一般地,探索PLMS Conduces的任务知识如何改变?一般地,探索PLMS Conduces的模式连通性,以了解不同微型模型的几何联系性,这可能有助于我们了解PLM下游适应的内在工作。