We consider decentralized model training in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard partitioned horizontally across its clients. We propose Tiered Decentralized Coordinate Descent (TDCD), a communication-efficient decentralized training algorithm for such two-tiered networks. To reduce communication overhead, the clients in each silo perform multiple local gradient steps before sharing updates with their hub. Each hub adjusts its coordinates by averaging its workers' updates, and then hubs exchange intermediate updates with one another. We present a theoretical analysis of our algorithm and show the dependence of the convergence rate on the number of vertical partitions, the number of local updates, and the number of clients in each hub. We further validate our approach empirically via simulation-based experiments using a variety of datasets and both convex and non-convex objectives.
翻译:我们考虑在分层通信网络中进行分散化模式培训。 我们的网络模型由一组分散化模式组成, 每个都持有数据垂直分割。 每个分仓都包含一个枢纽和一组客户, 筒仓的垂直数据碎片横跨客户。 我们提议为这种分层网络提供一条通信效率高的分散化培训算法( TDCD ) 。 为了减少通信间接费用, 每个分仓的客户在与其中心共享更新信息之前, 执行多个本地梯度步骤 。 每个中枢通过平均其员工更新信息来调整其坐标, 然后中心相互交换中间更新信息 。 我们对我们的算法进行理论分析, 并显示对垂直分割数量、 本地更新数量和每个中枢客户数的趋同率的依赖性 。 我们进一步通过模拟实验, 使用各种数据集, 以及 convex 和非 convex 目标, 来验证我们的做法 。