In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} \epsilon^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings.
翻译:近些年来,分散的双层优化问题在网络和机器学习界日益受到关注{在网络和机器学习界中日益受到关注 { 分散的双层优化问题在模拟对同侪网络(例如多试元学习、多试强化学习、个性化培训和Byzantine抗御学习)的分散学习问题方面的多功能性。然而,对于在计算和通信能力有限的情况下对同侪网络分散的双层优化问题来说,如何实现低样本和通信复杂性是迄今尚未得到充分探讨的两个基本挑战。在本文中,我们第一次尝试调查与非康韦思和与外部和内部分质网络(例如多剂元元元元元元元元元元元元的多功能化学习、多功能化培训和Byzantine-resident 学习。