We present FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a diverse range of important FL tasks, such as image classification, object detection, language modeling, speech recognition, and reinforcement learning. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we have also built an efficient evaluation platform to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms and includes new execution backends with minimal developer efforts. Finally, we perform indepth benchmark experiments on these datasets. Our experiments suggest fruitful opportunities in heterogeneity-aware co-optimizations of the system and statistical efficiency under realistic FL characteristics. FedScale is open-source with permissive licenses and actively maintained,1 and we welcome feedback and contributions from the community.
翻译:我们提出了一套多种多样的具有挑战性和现实的基准数据集,以方便进行可扩展、全面和可复制的联邦学习(FL)研究。FedSerate数据集规模庞大,包括各种重要的FL任务,如图像分类、物体探测、语言模型、语言模型、语音识别和强化学习等。我们为每个数据集提供一套使用现实数据分割和评价指标的统一评价协议。为满足大规模复制现实的FL的迫切需要,我们还建立了一个高效的评价平台,简化FL试验设置和模型评估进程并使之标准化。我们的评价平台提供了灵活的API,以实施新的FL算法,并包括新的执行后端,同时尽量减少开发努力。最后,我们在这些数据集上进行深入的基准实验。我们的实验表明,在现实的FL特性下,系统具有超常认知性、共同优化和统计效率方面的机会。FedSessal是开源的,并积极维护,1 我们欢迎社区的反馈和贡献。