Federated Learning (FL) trains a machine learning model on distributed clients without exposing individual data. Unlike centralized training that is usually based on carefully-organized data, FL deals with on-device data that are often unfiltered and imbalanced. As a result, conventional FL training protocol that treats all data equally leads to a waste of local computational resources and slows down the global learning process. To this end, we propose FedBalancer, a systematic FL framework that actively selects clients' training samples. Our sample selection strategy prioritizes more "informative" data while respecting privacy and computational capabilities of clients. To better utilize the sample selection to speed up global training, we further introduce an adaptive deadline control scheme that predicts the optimal deadline for each round with varying client train data. Compared with existing FL algorithms with deadline configuration methods, our evaluation on five datasets from three different domains shows that FedBalancer improves the time-to-accuracy performance by 1.22~4.62x while improving the model accuracy by 1.0~3.3%. We also show that FedBalancer is readily applicable to other FL approaches by demonstrating that FedBalancer improves the convergence speed and accuracy when operating jointly with three different FL algorithms.
翻译:联邦学习联合会(FL) 培训分布客户的机器学习模式, 不暴露个人数据 。 与通常基于仔细组织的数据的集中培训不同, FL 处理通常未经过滤和不平衡的在线设备数据。 因此, 常规FL培训协议, 同等处理所有数据, 常规FL培训协议导致当地计算资源的浪费, 并减缓全球学习进程。 为此, 我们提议Fed Balancer, 是一个系统化 FL 框架, 积极选择客户培训样本。 我们的样本选择战略优先考虑更多的“ 信息” 数据, 同时又尊重客户隐私和计算能力。 为了更好地利用样本选择, 加快全球培训, 我们进一步引入适应性期限控制计划, 预测每轮的最佳最后期限, 使用不同的客户列车数据 。 与现有的FL 算法相比, 我们对来自三个不同领域的五个数据集的评估显示 FedBalancer 提高了时间到准确性, 同时提高了模型的准确性 1.0~ 3. 3 % 。 我们还表明, FedBalancer 和 FL 三个算法方法的共同精确性提高了FL 。