In the distributed collaborative machine learning (DCML) paradigm, federated learning (FL) recently attracted much attention due to its applications in health, finance, and the latest innovations such as industry 4.0 and smart vehicles. FL provides privacy-by-design. It trains a machine learning model collaboratively over several distributed clients (ranging from two to millions) such as mobile phones, without sharing their raw data with any other participant. In practical scenarios, all clients do not have sufficient computing resources (e.g., Internet of Things), the machine learning model has millions of parameters, and its privacy between the server and the clients while training/testing is a prime concern (e.g., rival parties). In this regard, FL is not sufficient, so split learning (SL) is introduced. SL is reliable in these scenarios as it splits a model into multiple portions, distributes them among clients and server, and trains/tests their respective model portions to accomplish the full model training/testing. In SL, the participants do not share both data and their model portions to any other parties, and usually, a smaller network portion is assigned to the clients where data resides. Recently, a hybrid of FL and SL, called splitfed learning, is introduced to elevate the benefits of both FL (faster training/testing time) and SL (model split and training). Following the developments from FL to SL, and considering the importance of SL, this chapter is designed to provide extensive coverage in SL and its variants. The coverage includes fundamentals, existing findings, integration with privacy measures such as differential privacy, open problems, and code implementation.
翻译:在分布式合作机器学习模式(DCML)中,联邦学习(FL)最近因其在健康、金融以及工业4.0和智能车辆等最新创新的应用而引起人们的极大关注。FL提供逐个设计。FL为若干分布式客户(从两到百万不等)合作培训机器学习模式,例如移动电话,不与其他参与者分享原始数据。在实际情况下,所有客户都没有足够的计算资源(例如,Thing Internet of Things),机器学习模式有数百万个参数,服务器和客户之间的隐私,而培训/测试则是一个主要关切(例如,敌对方)。在这方面,FL不够充分,因此引入了分散式学习(SL),在这些情况下,SL是可靠的,因为它将一个模型分成多个部分,在客户之间分配,培训/测试各自的模型部分,以完成完整的模型培训/测试。在SL中,参与者不同时向任何其他方提供数据及其模型部分,通常将一个较小的网络部分分配给客户(SL的基本覆盖范围,S-L为S-FFS的升级阶段),将S-L的学习成果评为S-S-L的版本。