Federated learning (FL) is an effective solution to train machine learning models on the increasing amount of data generated by IoT devices and smartphones while keeping such data localized. Most previous work on federated learning assumes that clients operate on static datasets collected before training starts. This approach may be inefficient because 1) it ignores new samples clients collect during training, and 2) it may require a potentially long preparatory phase for clients to collect enough data. Moreover, learning on static datasets may be simply impossible in scenarios with small aggregate storage across devices. It is, therefore, necessary to design federated algorithms able to learn from data streams. In this work, we formulate and study the problem of \emph{federated learning for data streams}. We propose a general FL algorithm to learn from data streams through an opportune weighted empirical risk minimization. Our theoretical analysis provides insights to configure such an algorithm, and we evaluate its performance on a wide range of machine learning tasks.
翻译:联邦学习(FL)是培训机器学习模型的有效解决方案,这些模型涉及IoT装置和智能手机生成的数据数量不断增加,同时将这些数据本地化。以前大多数关于联邦学习的工作都假设客户在培训开始前使用收集到的静态数据集。这种方法可能效率低下,因为1)它忽视了培训期间收集的新样本客户,2)它可能要求客户有一个可能长的准备阶段来收集足够的数据。此外,在设备总储量小的情况下,对静态数据集的学习可能根本不可能。因此,有必要设计能够从数据流中学习的联邦算法。在这项工作中,我们制定和研究为数据流进行联合学习的问题。我们建议一种通用的FL算法,通过适当的加权实验风险最小化从数据流中学习。我们的理论分析为配置这种算法提供了深刻的洞察力,我们评估它在广泛的机器学习任务方面的表现。