Recent years have witnessed a large amount of decentralized data in various (edge) devices of end-users, while the decentralized data aggregation remains complicated for machine learning jobs because of regulations and laws. As a practical approach to handling decentralized data, Federated Learning (FL) enables collaborative global machine learning model training without sharing sensitive raw data. The servers schedule devices to jobs within the training process of FL. In contrast, device scheduling with multiple jobs in FL remains a critical and open problem. In this paper, we propose a novel multi-job FL framework, which enables the training process of multiple jobs in parallel. The multi-job FL framework is composed of a system model and a scheduling method. The system model enables a parallel training process of multiple jobs, with a cost model based on the data fairness and the training time of diverse devices during the parallel training process. We propose a novel intelligent scheduling approach based on multiple scheduling methods, including an original reinforcement learning-based scheduling method and an original Bayesian optimization-based scheduling method, which corresponds to a small cost while scheduling devices to multiple jobs. We conduct extensive experimentation with diverse jobs and datasets. The experimental results reveal that our proposed approaches significantly outperform baseline approaches in terms of training time (up to 12.73 times faster) and accuracy (up to 46.4% higher).
翻译:近年来,终端用户的各种(前沿)设备中有大量分散的数据,而分散的数据汇总由于法规和法律,对于机器学习工作来说仍然复杂。作为处理分散的数据的实用方法,联邦学习组织(FL)能够使全球机器学习模式培训合作,而没有共享敏感的原始数据。服务器调度设备到FL培训过程中的工作岗位。相比之下,在FL培训过程中,服务器调度设备到FL培训过程中的工作岗位仍然是一个关键和开放的问题。在本文件中,我们提议了一个新的多工作FL框架,使多种工作的培训过程能够平行进行。多工作FL框架由系统模型和时间安排方法组成。系统模型使多种工作的平行培训过程得以平行进行,其成本模型基于数据公平以及平行培训过程中各种设备的培训时间。我们提出基于多种时间安排方法的新颖的智能调度方法,包括原始的强化学习列表方法以及最初的Bayesian优化列表方法,该方法相当于一个小成本,同时将多种工作安排到多个工作。我们广泛试验了多种工作和数据集成的系统模型。系统模型使得多种工作的平行培训过程能够同时进行。实验结果显示我们所提议的时间比基准时间快(为基准方法)。