Cross-silo Federated learning (FL) has become a promising tool in machine learning applications for healthcare. It allows hospitals/institutions to train models with sufficient data while the data is kept private. To make sure the FL model is robust when facing heterogeneous data among FL clients, most efforts focus on personalizing models for clients. However, the latent relationships between clients' data are ignored. In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client's data distribution is assumed to be a mixture of several predefined domains. Recognizing the diversity of domains and the similarity within domains, we propose a novel method, FedDAR, which learns a domain shared representation and domain-wise personalized prediction heads in a decoupled manner. For simplified linear regression settings, we have theoretically proved that FedDAR enjoys a linear convergence rate. For general settings, we have performed intensive empirical studies on both synthetic and real-world medical datasets which demonstrate its superiority over prior FL methods.
翻译:跨西罗联邦学习(FL)已成为在医疗保健的机器学习应用中一个很有希望的工具。 它允许医院/机构在数据保持私密的情况下,用足够的数据对模型进行培训。 在面临不同FL客户间的数据时,为了确保FL模型是稳健的, 多数努力侧重于客户个人化模型。 但是, 客户数据之间的潜在关系被忽视。 在这项工作中, 我们侧重于一个特殊的非二次FL问题, 叫做 Domain- mixed FL, 每个客户的数据分布被假定是几个预先定义的域的混合体。 我们认识到域的多样性和域内的相似性, 我们提出了一个新颖的方法, FedDAR, 以拆分的方式学习一个域共享的表示和域化个人化预测头。 关于简化的线性回归环境, 我们理论上证明FDDAR 具有线性融合率。 在一般情况下, 我们对合成和真实世界的医疗数据集进行了密集的经验研究, 以显示其优于FL方法。