Machine learning models have been deployed in mobile networks to deal with massive data from different layers to enable automated network management and intelligence on devices. To overcome high communication cost and severe privacy concerns of centralized machine learning, federated learning (FL) has been proposed to achieve distributed machine learning among networked devices. While the computation and communication limitation has been widely studied, the impact of on-device storage on the performance of FL is still not explored. Without an effective data selection policy to filter the massive streaming data on devices, classical FL can suffer from much longer model training time ($4\times$) and significant inference accuracy reduction ($7\%$), observed in our experiments. In this work, we take the first step to consider the online data selection for FL with limited on-device storage. We first define a new data valuation metric for data evaluation and selection in FL with theoretical guarantees for speeding up model convergence and enhancing final model accuracy, simultaneously. We further design {\ttfamily ODE}, a framework of \textbf{O}nline \textbf{D}ata s\textbf{E}lection for FL, to coordinate networked devices to store valuable data samples. Experimental results on one industrial dataset and three public datasets show the remarkable advantages of {\ttfamily ODE} over the state-of-the-art approaches. Particularly, on the industrial dataset, {\ttfamily ODE} achieves as high as $2.5\times$ speedup of training time and $6\%$ increase in inference accuracy, and is robust to various factors in practical environments.
翻译:在移动网络中安装了机器学习模型,以处理不同层次的大量数据,从而实现自动化网络管理和设备情报。为了克服通信成本高和中央机器学习的隐私问题,建议了联合学习(FL),以便在网络设备中实现分散的机器学习。虽然对计算和通信限制进行了广泛研究,但对于设备上设备存储对FL性能的影响仍未进行探索。如果没有一项有效的数据选择政策来过滤设备上的大量流数据,传统FL可能会受到我们实验中观察到的更长得多的模型培训时间(4\time$)和大幅度的推断准确性降低(7<unk> $ 美元 )。在这项工作中,我们首先考虑FL的在线数据选择,为FL的数据评估和选择制定新的数据估值指标,同时提供理论保证加速模型趋同并提高最后模型的准确性能。我们进一步设计了 & trifle=ODE}, 一种用于实际性价调数据速度的方法增加了 {Eltext\\\ friendral 快速性数据设备在FL 的高级数据模型中实现了一个可观性数据的网络优势。</s>