In recent years, data and computing resources are typically distributed in the devices of end users, various regions or organizations. Because of laws or regulations, the distributed data and computing resources cannot be directly shared among different regions or organizations for machine learning tasks. Federated learning emerges as an efficient approach to exploit distributed data and computing resources, so as to collaboratively train machine learning models, while obeying the laws and regulations and ensuring data security and data privacy. In this paper, we provide a comprehensive survey of existing works for federated learning. We propose a functional architecture of federated learning systems and a taxonomy of related techniques. Furthermore, we present the distributed training, data communication, and security of FL systems. Finally, we analyze their limitations and propose future research directions.
翻译:近年来,数据和计算资源通常在终端用户、不同区域或组织的装置中分配,由于法律或规章,分布的数据和计算资源不能由不同区域或组织直接分享,用于机器学习任务; 联邦学习是利用分布的数据和计算资源的一种有效方法,目的是合作培训机器学习模式,同时遵守法律和规章,确保数据安全和数据隐私; 本文对联邦化学习的现有工作进行了全面调查; 我们提出了联邦化学习系统的功能架构和相关技术分类; 此外,我们介绍了分布式培训、数据通信和FL系统的安全; 最后,我们分析了这些系统的局限性,并提出了未来的研究方向。