We show that low-rank adaptation of large-scale models suffers from a low stable rank that is well below the linear algebraic rank of the subspace, degrading fine-tuning performance. To mitigate the underutilization of the allocated subspace, we propose PoLAR, a parameterization inspired by the polar decomposition that factorizes the low-rank update into two direction matrices constrained to Stiefel manifolds and an unconstrained scale matrix. Our theory shows that PoLAR yields an exponentially faster convergence rate on a canonical low-rank adaptation problem. Pairing the parameterization with Riemannian optimization leads to consistent gains on three different benchmarks testing general language understanding, commonsense reasoning, and mathematical problem solving with base model sizes ranging from 350M to 27B.
翻译:我们发现,大规模模型的低秩适配存在稳定秩较低的问题,其值远低于子空间的线性代数秩,从而影响了微调性能。为缓解已分配子空间的利用不足,我们提出PoLAR——一种受极分解启发的参数化方法,将低秩更新分解为两个约束于Stiefel流形的方向矩阵和一个无约束的尺度矩阵。理论分析表明,PoLAR在典型低秩适配问题上具有指数级更快的收敛速度。将该参数化方法与黎曼优化相结合,在三个不同基准测试(涵盖通用语言理解、常识推理和数学问题求解,基础模型规模从3.5亿到270亿参数)中均取得了稳定性能提升。