Federated learning is a distributed paradigm that allows multiple parties to collaboratively train deep models without exchanging the raw data. However, the data distribution among clients is naturally non-i.i.d., which leads to severe degradation of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm to address feature shift in clients' samples, which can be caused by various factors, e.g., acquisition differences in medical imaging. To reach this goal, we propose FedFA to tackle federated learning from a distinct perspective of federated feature augmentation. FedFA is based on a major insight that each client's data distribution can be characterized by statistics (i.e., mean and standard deviation) of latent features; and it is likely to manipulate these local statistics globally, i.e., based on information in the entire federation, to let clients have a better sense of the underlying distribution and therefore alleviate local data bias. Based on this insight, we propose to augment each local feature statistic probabilistically based on a normal distribution, whose mean is the original statistic and variance quantifies the augmentation scope. Key to our approach is the determination of a meaningful Gaussian variance, which is accomplished by taking into account not only biased data of each individual client, but also underlying feature statistics characterized by all participating clients. We offer both theoretical and empirical justifications to verify the effectiveness of FedFA. Our code is available at https://github.com/tfzhou/FedFA.
翻译:联邦学习是一种分布式模式,允许多方合作培训深层次模型,而不交换原始数据,然而,客户之间的数据分配自然是非i.i.d.d.,这导致所学模型严重退化。本文件的主要目标是开发一个强有力的联邦学习算法,以解决客户样本特征变化的问题,这种变化可能由多种因素造成,例如医疗成像方面的采购差异。为了实现这一目标,我们建议联邦食品与食品联合会从分化特性增强的不同角度出发,处理联合会的深层次学习问题。联邦食品与食品联合会基于一个重大的洞察,每个客户的数据分配都以潜在特征的统计数据(即,平均值和标准偏差)为特征的特征为特征;而本文件的主要目的是在全球范围操作这些本地统计数据,即根据整个联邦体系的信息,让客户更好地了解基本分布情况,从而减轻当地数据偏差。基于这一洞察,我们提议根据正常分布,加强每个本地特征的概率统计,其意思是原始统计和差异变异度范围。我们采用的方法,即以参与度的客户的每个客户的统计方式,其基础是真实性统计,我们通过货币基金账户的每个客户的统计分析,其格式,其基础是真实性定义,我们对每个客户的统计的准确性进行。