We present FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a diverse range of important FL tasks, such as image classification, object detection, language modeling, speech recognition, and reinforcement learning. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we have also built an efficient evaluation platform to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms and include new execution backends with minimal developer efforts. Finally, we perform indepth benchmark experiments on these datasets. Our experiments suggest that FedScale presents significant challenges of heterogeneity-aware co-optimizations of the system and statistical efficiency under realistic FL characteristics, indicating fruitful opportunities for future research. FedScale is open-source with permissive licenses and actively maintained, and we welcome feedback and contributions from the community.
翻译:我们提出了一套多种多样的具有挑战性和现实性的基准数据集,以方便进行可扩展、全面和可复制的联邦学习(FL)研究。FedSeral数据集规模庞大,包括各种重要的FL任务,如图像分类、物体探测、语言模型、语言模型、语音识别和强化学习等。我们为每个数据集提供了一个使用现实数据分解和评价指标的统一评价协议。为满足大规模复制现实的FL的迫切需要,我们还建立了一个高效的评价平台,以简化FL试验设置和模型评价进程并使之标准化。我们的评价平台提供了灵活的API,以实施新的FL算法,并包括新的执行后端,同时作出最小的发展努力。最后,我们对这些数据集进行深入的基准实验。我们的实验表明,FedSiral在现实的FL特性下,对系统及其统计效率的偏差性共同选择提出了重大挑战,表明了今后研究的丰硕机会。FedSeral是开放源,并积极维护了许可,我们欢迎来自社区的反馈和贡献。