Federated Learning (FL) applied to real world data may suffer from several idiosyncrasies. One such idiosyncrasy is the data distribution across devices. Data across devices could be distributed such that there are some "heavy devices" with large amounts of data while there are many "light users" with only a handful of data points. There also exists heterogeneity of data across devices. In this study, we evaluate the impact of such idiosyncrasies on Natural Language Understanding (NLU) models trained using FL. We conduct experiments on data obtained from a large scale NLU system serving thousands of devices and show that simple non-uniform device selection based on the number of interactions at each round of FL training boosts the performance of the model. This benefit is further amplified in continual FL on consecutive time periods, where non-uniform sampling manages to swiftly catch up with FL methods using all data at once.
翻译:适用于现实世界数据的联邦学习(FL)可能受到若干特殊性的影响。这种特殊性之一是跨设备的数据分布。跨设备的数据可以分布,这样就有些“重型设备”具有大量数据,而许多“光用户”只有少量数据点。跨设备的数据也存在差异性。在这项研究中,我们评估了这种特殊性对使用FL培训的自然语言理解模型的影响。我们实验了从大型NLU系统获得的数据,为数千个设备服务,并表明根据每一轮FL培训的互动次数选择简单的非统一设备可以促进模型的性能。在连续的FL中,这种效益在连续的FL中进一步放大,在非统一的取样中能够一次利用所有数据迅速赶上FL方法。