Recently, the ever-growing demand for privacy-oriented machine learning has motivated researchers to develop federated and decentralized learning techniques, allowing individual clients to train models collaboratively without disclosing their private datasets. However, widespread adoption has been limited in domains relying on high levels of user trust, where assessment of data compatibility is essential. In this work, we define and address low interoperability induced by underlying client data inconsistencies in federated learning for tabular data. The proposed method, iFedAvg, builds on federated averaging adding local element-wise affine layers to allow for a personalized and granular understanding of the collaborative learning process. Thus, enabling the detection of outlier datasets in the federation and also learning the compensation for local data distribution shifts without sharing any original data. We evaluate iFedAvg using several public benchmarks and a previously unstudied collection of real-world datasets from the 2014 - 2016 West African Ebola epidemic, jointly forming the largest such dataset in the world. In all evaluations, iFedAvg achieves competitive average performance with negligible overhead. It additionally shows substantial improvement on outlier clients, highlighting increased robustness to individual dataset shifts. Most importantly, our method provides valuable client-specific insights at a fine-grained level to guide interoperable federated learning.
翻译:最近,对以隐私为导向的机器学习的需求日益增长,促使研究人员开发联合和分散的学习技术,使个别客户能够在不披露其私人数据集的情况下合作培训模型,然而,在依赖高度用户信任的领域,对数据兼容性的评估至关重要,因此广泛采用的方法有限。在这项工作中,我们利用若干公共基准和先前未经研究的2014 - 2016年西非埃博拉流行病真实世界数据集收集工作,界定和解决该数据库的互操作性较低问题。拟议方法,即iFedAvg,以联邦平均法为基础,增加本地元素偏差层,以便个人化和颗粒化地理解协作学习过程。因此,能够发现联邦内外部数据集,并学习对本地数据分配变化的补偿,而不分享任何原始数据。我们利用若干公共基准和以往未经研究的从2014 - 2016 2016 西非埃博拉疫情中收集的真实世界数据集,共同组成世界上最大的此类数据集。在所有评价中,iFedAvg实现个人平均竞争力,以微不足道的间接费用为基础。此外,它展示了联邦内部客户在高端一级的重大改进,突出了对当前客户的学习水平的精确度。