Training at the edge utilizes continuously evolving data generated at different locations. Privacy concerns prohibit the co-location of this spatially as well as temporally distributed data, deeming it crucial to design training algorithms that enable efficient continual learning over decentralized private data. Decentralized learning allows serverless training with spatially distributed data. A fundamental barrier in such distributed learning is the high bandwidth cost of communicating model updates between agents. Moreover, existing works under this training paradigm are not inherently suitable for learning a temporal sequence of tasks while retaining the previously acquired knowledge. In this work, we propose CoDeC, a novel communication-efficient decentralized continual learning algorithm which addresses these challenges. We mitigate catastrophic forgetting while learning a task sequence in a decentralized learning setup by combining orthogonal gradient projection with gossip averaging across decentralized agents. Further, CoDeC includes a novel lossless communication compression scheme based on the gradient subspaces. We express layer-wise gradients as a linear combination of the basis vectors of these gradient subspaces and communicate the associated coefficients. We theoretically analyze the convergence rate for our algorithm and demonstrate through an extensive set of experiments that CoDeC successfully learns distributed continual tasks with minimal forgetting. The proposed compression scheme results in up to 4.8x reduction in communication costs with iso-performance as the full communication baseline.
翻译:边缘端训练利用在不同位置产生的不断演变的数据。隐私问题禁止在同一地点共存这种空间和时间分布的数据,因此设计能够在分散的私有数据上实现高效连续学习的训练算法至关重要。分散式学习允许利用分散的数据进行无服务器训练。 这种分散式学习的一个基本障碍是在代理之间传递模型更新的高带宽成本。此外,现有的这种训练范式并不适合在保持先前获取的知识的同时学习时态任务。 在这项工作中,我们提出了CoDeC,一种新颖的通信高效的分散式连续学习算法,它解决了这些挑战。我们通过在分散的代理之间结合正交梯度投影和猜测平均来缓解灾难性遗忘,在分散的学习设置中学习任务序列。此外,CoDeC包括一种基于梯度子空间的新型无损通信压缩方案。我们将层次梯度表示为这些梯度子空间的基向量的线性组合,并传递相关系数。我们在理论上分析了我们算法的收敛速度,并通过一系列广泛的实验证明CoDeC成功地以最小的遗忘学习了分散式连续任务。所提出的压缩方案导致了多达4.8倍的通信成本降低,并且与完全通信基线的等效性相同。