Traditional machine learning relies on a centralized data pipeline, i.e., data are provided to a central server for model training. In many applications, however, data are inherently fragmented. Such a decentralized nature of these databases presents the biggest challenge for collaboration: sending all decentralized datasets to a central server raises serious privacy concerns. Although there has been a joint effort in tackling such a critical issue by proposing privacy-preserving machine learning frameworks, such as federated learning, most state-of-the-art frameworks are built still in a centralized way, in which a central client is needed for collecting and distributing model information (instead of data itself) from every other client, leading to high communication pressure and high vulnerability when there exists a failure at or attack on the central client. Here we propose a principled decentralized federated learning algorithm (DeceFL), which does not require a central client and relies only on local information transmission between clients and their neighbors, representing a fully decentralized learning framework. It has been further proven that every client reaches the global minimum with zero performance gap and achieves the same convergence rate $O(1/T)$ (where $T$ is the number of iterations in gradient descent) as centralized federated learning when the loss function is smooth and strongly convex. Finally, the proposed algorithm has been applied to a number of applications to illustrate its effectiveness for both convex and nonconvex loss functions, demonstrating its applicability to a wide range of real-world medical and industrial applications.
翻译:传统机器学习依靠中央数据管道,即数据提供给中央服务器进行模型培训,但在许多应用程序中,数据本质上是零散的。这些数据库的这种分散性质是合作的最大挑战:将所有分散的数据集发送到中央服务器会引起严重的隐私问题。虽然在解决这一关键问题方面作出了共同努力,提出了保护隐私的机器学习框架,如联合学习,但大多数最先进的框架仍然以集中方式建立,需要有一个中央客户从其他客户那里收集和传播模型信息(而不是数据本身),从而在中央客户出现故障或攻击时,导致高度的通信压力和高度的脆弱性。我们在这里建议采用有原则的分散的联邦学习算法(DeceFL),它不需要中央客户,而仅仅依靠客户与其邻居之间的本地信息传输,代表完全分散的学习框架。还进一步证明,每个客户都达到全球最低业绩差距,达到相同的统一率(而不是数据本身的数据本身,因为美元是宽度的应用,当中央客户应用程序的缩缩缩缩缩数时,最后的缩缩缩缩数是用于中央的缩缩缩缩缩的缩功能。