Federated Learning (FL) is transforming the ML training ecosystem from a centralized over-the-cloud setting to distributed training over edge devices in order to strengthen data privacy. An essential but rarely studied challenge in FL is label deficiency at the edge. This problem is even more pronounced in FL compared to centralized training due to the fact that FL users are often reluctant to label their private data. Furthermore, due to the heterogeneous nature of the data at edge devices, it is crucial to develop personalized models. In this paper we propose self-supervised federated learning (SSFL), a unified self-supervised and personalized federated learning framework, and a series of algorithms under this framework which work towards addressing these challenges. First, under the SSFL framework, we demonstrate that the standard FedAvg algorithm is compatible with recent breakthroughs in centralized self-supervised learning such as SimSiam networks. Moreover, to deal with data heterogeneity at the edge devices in this framework, we have innovated a series of algorithms that broaden existing supervised personalization algorithms into the setting of self-supervised learning. We further propose a novel personalized federated self-supervised learning algorithm, Per-SSFL, which balances personalization and consensus by carefully regulating the distance between the local and global representations of data. To provide a comprehensive comparative analysis of all proposed algorithms, we also develop a distributed training system and related evaluation protocol for SSFL. Our findings show that the gap of evaluation accuracy between supervised learning and unsupervised learning in FL is both small and reasonable. The performance comparison indicates the representation regularization-based personalization method is able to outperform other variants.
翻译:联邦学习联合会(FL)正在将ML培训生态系统从一个集中的超高水平环境转换为分布式的超边缘培训设备,以加强数据隐私。FL中一个基本但很少研究的挑战就是边端的标签缺陷。这个问题在FL中更为突出,因为FL用户往往不愿贴上私人数据标签,因此与集中培训相比,这个问题在FL中更为突出。此外,由于边缘设备的数据性质各异,因此开发个性化模型至关重要。在本文中,我们建议自监督的自控和自控的自控的自控学习(SSFFL),统一自我监督的和个性化的联邦学习框架下的一系列算法有助于应对这些挑战。首先,根据SSFLF框架,我们证明标准的FDAV算法与中央自我监督学习(如SimSiam网络)的最新突破一致。此外,为了处理边端设备中的数据偏差性,我们已创新了一系列的算法,将现有的监督的个人化个人化算法扩大为自我监督的自我监控的自我监控和自闭式学习的比较系统,我们进一步提出个人自我升级的自我升级的自我学习模式。