Delta tuning (DET, also known as parameter-efficient tuning) is deemed as the new paradigm for using pre-trained language models (PLMs). Up to now, various DETs with distinct design elements have been proposed, achieving performance on par with fine-tuning. However, the mechanisms behind the above success are still under-explored, especially the connections among various DETs. To fathom the mystery, we hypothesize that the adaptations of different DETs could all be reparameterized as low-dimensional optimizations in a unified optimization subspace, which could be found by jointly decomposing independent solutions of different DETs. Then we explore the connections among different DETs by conducting optimization within the subspace. In experiments, we find that, for a certain DET, conducting optimization simply in the subspace could achieve comparable performance to its original space, and the found solution in the subspace could be transferred to another DET and achieve non-trivial performance. We also visualize the performance landscape of the subspace and find that there exists a substantial region where different DETs all perform well. Finally, we extend our analysis and show the strong connections between fine-tuning and DETs.
翻译:Delta 调频(DET,也称为参数效率调频)被认为是使用预先训练的语言模型(PLMs)的新范例。 到目前为止,已经提出了各种具有不同设计要素的DET(DET),在微调的同时实现了性能。然而,上述成功背后的机制,特别是各种DETs之间的连接,仍然未得到充分探索。为了理解这个谜题,我们假设不同的DET的适应方法都可以在统一优化子空间中作为低维优化进行再校准,这可以通过联合分解不同DETs的独立解决方案找到。然后,我们通过在子空间内进行优化来探索不同DETs之间的联系。在实验中,我们发现,对于某个DET来说,仅仅在子空间进行优化可以达到与其原始空间的类似性能,在子空间中找到的解决方案可以转移到另一个DET,实现非三维性性能。我们还对子空间的性能景观进行了直观分析,并发现存在一个很大的区域,不同DETs都运行得很好。最后,我们扩展了我们的分析并展示了我们的强的DET关系。