Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that enables in-situ model training and testing on edge data. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. As a result, data characteristics and device capabilities vary widely across clients. Yet, existing efforts randomly select FL participants, which leads to poor model and system efficiency. In this paper, we propose Kuiper to improve the performance of federated training and testing with guided participant selection. With an aim to improve time-to-accuracy performance in model training, Kuiper prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly. To enable FL developers to interpret their results in model testing, Kuiper enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Our evaluation shows that, compared to existing participant selection mechanisms, Kuiper improves time-to-accuracy performance by 1.2x-14.1x and final model accuracy by 1.3%-9.8%, while efficiently enforcing developer requirements on data distributions at the scale of millions of clients.
翻译:联邦学习组织(FL)是分布式机器学习(ML)的一个新方向,它使现场示范培训和测试边缘数据得以进行。尽管其最终目标与传统的ML相同,但FL处决的规模差异很大,涉及数千至数百万个参与装置,因此,不同客户的数据特点和装置能力差异很大。然而,现有努力随机挑选FL参与者,导致模式和系统效率低下。在本文件中,我们提议Kuiper改进Federal培训的绩效和采用有指导的参与者选择进行的测试。为了改进模型培训的时间到准确性绩效,Kuiper优先使用既拥有最有助于提高模型准确性、又能迅速开展培训的客户。为了使FL开发者能够在模型测试中解释其结果,Kuiper执行关于分发参与者数据的要求,同时改进选美食客户的填充测试时间。我们的评估表明,与现有参与者选择机制相比,Kuiper改进了模型准确性绩效,1.2x-14.1xxxx和最终模型客户在1.3%-9%的数据比例上有效进行数据分配。