Federated Learning (FL) wherein multiple institutions collaboratively train a machine learning model without sharing data is becoming popular. Participating institutions might not contribute equally, some contribute more data, some better quality data or some more diverse data. To fairly rank the contribution of different institutions, Shapley value (SV) has emerged as the method of choice. Exact SV computation is impossibly expensive, especially when there are hundreds of contributors. Existing SV computation techniques use approximations. However, in healthcare where the number of contributing institutions are likely not of a colossal scale, computing exact SVs is still exorbitantly expensive, but not impossible. For such settings, we propose an efficient SV computation technique called SaFE (Shapley Value for Federated Learning using Ensembling). We empirically show that SaFE computes values that are close to exact SVs, and that it performs better than current SV approximations. This is particularly relevant in medical imaging setting where widespread heterogeneity across institutions is rampant and fast accurate data valuation is required to determine the contribution of each participant in multi-institutional collaborative learning.
翻译:联邦学习联合会(FL),其中多个机构合作培训机器学习模式,但不分享数据,正在变得普及。参与机构可能无法平等贡献更多的数据,有些提供质量更高的数据或更多样化的数据。为了对不同机构的贡献进行公平的排名,Spley value(SV)已经成为一种选择方法。Exact SV计算成本极低,特别是在有数百个缴款者的情况下。现有的SV计算技术使用近似值。然而,在交费机构数目可能不是巨大规模的医疗保健方面,计算准确的SV仍然非常昂贵,但并非不可能。对于这种环境,我们建议一种称为SFE(使用组合的联邦学习的共享值)的高效SVE计算技术。我们从经验上表明,SFE的计算值接近SV,而且其表现优于目前的SV近似值。这对于医学成像环境尤为重要,因为在这种环境中,各机构之间普遍存在的异质性十分普遍,需要快速准确的数据评估来确定每个参与者在多机构协作学习中的贡献。