Federated Learning is a framework that jointly trains a model \textit{with} complete knowledge on a remotely placed centralized server, but \textit{without} the requirement of accessing the data stored in distributed machines. Some work assumes that the data generated from edge devices are identically and independently sampled from a common population distribution. However, such ideal sampling may not be realistic in many contexts. Also, models based on intrinsic agency, such as active sampling schemes, may lead to highly biased sampling. So an imminent question is how robust Federated Learning is to biased sampling? In this work\footnote{\url{https://github.com/jiaqian/robustness_of_FL}}, we experimentally investigate two such scenarios. First, we study a centralized classifier aggregated from a collection of local classifiers trained with data having categorical heterogeneity. Second, we study a classifier aggregated from a collection of local classifiers trained by data through active sampling at the edge. We present evidence in both scenarios that Federated Learning is robust to data heterogeneity when local training iterations and communication frequency are appropriately chosen.
翻译:联邦学习联盟是一个框架,它共同训练一个模型\ textit{ 在远程中央服务器上拥有完全的知识,但是\ textit{ 没有访问分布式机器中储存的数据的要求。有些工作假设边缘设备产生的数据是同一种和独立地从一般人口分布中抽样的。然而,这种理想的抽样在很多情况下可能并不现实。此外,基于内在机制的模型,例如积极的抽样计划,可能导致高度偏差的抽样。因此,一个紧迫的问题是联邦学习联盟是如何进行有偏见的抽样的?在这个工作中,我们实验性地调查两种情况。首先,我们研究一个集中的分类器,从受过绝对异质数据训练的地方分类器收集出来。第二,我们研究从通过在边缘积极取样进行的数据培训的当地分类器收集出来的一个分类器。我们在两种假设中都提出证据,即当当地培训的反复性和通信频率被正确选择时,联邦学习联盟对数据具有很强的遗传性。