The impact of local averaging on the performance of federated learning (FL) systems is studied in the presence of communication delay between the clients and the parameter server. To minimize the effect of delay, clients are assigned into different groups, each having its own local parameter server (LPS) that aggregates its clients' models. The groups' models are then aggregated at a global parameter server (GPS) that only communicates with the LPSs. Such setting is known as hierarchical FL (HFL). Different from most works in the literature, the number of local and global communication rounds in our work is randomly determined by the (different) delays experienced by each group of clients. Specifically, the number of local averaging rounds are tied to a wall-clock time period coined the sync time $S$, after which the LPSs synchronize their models by sharing them with the GPS. Such sync time $S$ is then reapplied until a global wall-clock time is exhausted.
翻译:在客户和参数服务器之间通信延迟的情况下,研究当地平均对联合学习系统(FL)性能的影响。为了尽量减少延迟的影响,客户被分配到不同的组别,每个组有自己的本地参数服务器(LPS),以汇总客户模型。然后,小组的模型被汇总在一个全球参数服务器(GPS)上,该服务器只与LPS进行通信。这种设置被称为等级FL(HFL)。与文献中的大多数作品不同,我们工作中的本地和全球通信回合数目由每组客户经历的(不同)延迟随机决定。具体地说,本地平均子弹的数目被绑在以墙为时长的时间里,同步时间为$S美元,之后,LPS通过与GPS共享这些时间使模型同步。这种同步时间(HFL)随后被重新启用,直到全球一堵时钟时间用完为止。