Federated Learning (FL) is an emerging direction in distributed machine learning (ML) that enables in-situ model training and testing on edge data. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. As a result, data characteristics and device capabilities vary widely across clients. Yet, existing efforts randomly select FL participants, which leads to poor model and system efficiency. In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. With an aim to improve time-to-accuracy performance in model training, Oort prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly. To enable FL developers to interpret their results in model testing, Oort enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. Our evaluation shows that, compared to existing participant selection mechanisms, Oort improves time-to-accuracy performance by 1.2x-14.1x and final model accuracy by 1.3%-9.8%, while efficiently enforcing developer-specified model testing criteria at the scale of millions of clients.
翻译:联邦学习联合会(FL)是分布式机器学习(ML)的一个新方向,它有助于现场示范培训和测试边缘数据。尽管其最终目标与传统的ML相同,但FL处决在规模上差异很大,涉及的参加装置有数千至数百万个,因此,数据特点和装置能力在客户之间差异很大。但是,现有的努力随机挑选FL参与者,导致模式和系统效率差。在本文件中,我们提议改进Federal培训的绩效和采用有指导的参与者选择进行的测试。为了改进模型培训的时间到准确性业绩,Oort优先考虑使用既拥有最有助于提高模型准确性、又能迅速开展培训的客户。为了让FL开发商能够在模型测试中解释其结果,Oort强制执行其对参与者数据分配的要求,同时改进选樱的客户的填充测试时间。我们的评价表明,与现有的参与者选择机制相比,Oort改进了模型到准确性业绩,用1.2x-14.1xxxx和最终模型测试了1.3%-9%客户的准确度,同时以1.3%-9%的标准执行。