Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works have proposed representative datasets for cross-device FL, few realistic healthcare cross-silo FL datasets exist, thereby slowing algorithmic research in this critical application. In this work, we propose a novel cross-silo dataset suite focused on healthcare, FLamby (Federated Learning AMple Benchmark of Your cross-silo strategies), to bridge the gap between theory and practice of cross-silo FL. FLamby encompasses 7 healthcare datasets with natural splits, covering multiple tasks, modalities, and data volumes, each accompanied with baseline training code. As an illustration, we additionally benchmark standard FL algorithms on all datasets. Our flexible and modular suite allows researchers to easily download datasets, reproduce results and re-use the different components for their research. FLamby is available at~\url{www.github.com/owkin/flamby}.
翻译:联邦学习联盟(FL)是一种新颖的方法,使持有敏感数据的多个客户能够将敏感数据用于合作培训机器学习模式,而没有集中数据。跨SIlo FL设置符合少数($-50美元)可靠客户的情况,每个客户都持有中、大数据集,通常在保健、金融或行业等应用中找到。虽然以前的工作为交叉设计FL提出了具有代表性的数据集,但很少有现实的跨sil保健跨SIlo FL数据集存在,从而减缓了这一关键应用程序的算法研究。在这项工作中,我们提议建立一个新的跨SIlo数据集套件,侧重于医疗保健、FLamby(你跨SIlo战略的联邦学习基准),以弥合跨SIlo FL的理论与实践之间的差距。FL.FLamby包含7个保健数据集,包含多种任务、模式和数据量,每个数据都附有基线培训代码。举例说,我们对所有数据集的标准 FLL的算法进行了进一步基准。我们的灵活和模块化套件可以让研究人员在Fset、复制结果和Re-lab/reus 不同的研究组成部分。Frbby。