Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner, i.e., they do not all have the same number of data. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two approaches to follow will be analyzed in the case of intermittent clients, as in a real scenario some clients may leave the training, and some new ones may enter the training. The evolution of the results for the test set in terms of accuracy, area under the curve and execution time is shown as the number of clients into which the original data is divided increases. Finally, improvements and future work in the field are proposed.
翻译:联邦学习是一种数据权力下放的隐私保护技术,用于以安全的方式进行机器或深层学习。在本文件中,我们介绍了关于联合会学习的理论方面,例如,介绍一个综合操作员、不同类型的联合会学习,以及在客户数据分配方面需要考虑的问题,同时对客户数量不同的使用案例进行了详尽的分析。具体地说,建议使用一个医学图像分析案例,使用从开放数据储存处获得的胸部X光图像。除了隐私方面的优势外,还将研究对典型案例(集中方法)的预测(准确性和曲线下领域)和缩短执行时间的改进。不同的客户将从培训数据中模拟,选择的方式不平衡,即它们并不都拥有相同数量的数据。对3个或10个客户进行接触和比较的结果,以及对照集中案件。在间歇客户方面将采用两种方法进行分析,因为一些客户可能离开培训,而一些新的客户可能离开培训,而一些新的客户可能进入执行领域,最终显示数据改进的进度。在最初的实地中,在最初的实地中,将显示数据改进。