With the development of large-scale models, traditional distributed bilevel optimization algorithms cannot be applied directly in low-resource clients. The key reason lies in the excessive computation involved in optimizing both the lower- and upper-level functions. Thus, we present the first resource-adaptive distributed bilevel optimization framework with a second-order free hypergradient estimator, which allows each client to optimize the submodels adapted to the available resources. Due to the coupled influence of partial outer parameters x and inner parameters y, it's challenging to theoretically analyze the upper bound regarding the globally averaged hypergradient for full model parameters. The error bound of inner parameter also needs to be reformulated since the local partial training. The provable theorems show that both RABO and RAFBO can achieve an asymptotically optimal convergence rate of $O(1/\sqrt{C_x^{\ast}Q})$, which is dominated by the minimum coverage of the outer parameter $C_x^{\ast}$. Extensive experiments on two different tasks demonstrate the effectiveness and computation efficiency of our proposed methods.
翻译:随着大规模模型的发展,传统分布式双层优化算法无法直接应用于低资源客户端。其关键原因在于同时优化下层函数和上层函数涉及的计算量过大。因此,我们提出了首个资源自适应的分布式双层优化框架,该框架采用免二阶的超梯度估计器,使得每个客户端能够根据可用资源优化适配的子模型。由于部分外部参数 x 与内部参数 y 存在耦合影响,从理论上分析关于完整模型参数的全局平均超梯度上界具有挑战性。同时,由于本地部分训练,内部参数的误差界也需要重新构建。可证明的定理表明,RABO 与 RAFBO 均能以 $O(1/\sqrt{C_x^{\ast}Q})$ 的渐进最优收敛率收敛,该收敛率由外部参数的最小覆盖度 $C_x^{\ast}$ 主导。在两个不同任务上进行的大量实验证明了我们所提方法的有效性与计算效率。