Federated learning is generally used in tasks where labels are readily available (e.g., next word prediction). Relaxing this constraint requires design of unsupervised learning techniques that can support desirable properties for federated training: robustness to statistical/systems heterogeneity, scalability with number of participants, and communication efficiency. Prior work on this topic has focused on directly extending centralized self-supervised learning techniques, which are not designed to have the properties listed above. To address this situation, we propose Orchestra, a novel unsupervised federated learning technique that exploits the federation's hierarchy to orchestrate a distributed clustering task and enforce a globally consistent partitioning of clients' data into discriminable clusters. We show the algorithmic pipeline in Orchestra guarantees good generalization performance under a linear probe, allowing it to outperform alternative techniques in a broad range of conditions, including variation in heterogeneity, number of clients, participation ratio, and local epochs.
翻译:联邦学习通常用于标签很容易获得的任务(例如下个字的预测)。放宽这一限制要求设计不受监督的学习技术,这种技术可以支持联邦培训的可取特性:对统计/系统的异质性、与参与者人数的可调适性和通信效率的稳健性:Orchestra的算法管道保证在线性探测下的良好一般化性能,使其能够在广泛的条件下超越替代技术,包括异质性、客户数量、参与率和当地教会的差异。