Federated Learning is a distributed machine learning paradigm dealing with decentralized and personal datasets. Since data reside on devices like smartphones and virtual assistants, labeling is entrusted to the clients, or labels are extracted in an automated way. Specifically, in the case of audio data, acquiring semantic annotations can be prohibitively expensive and time-consuming. As a result, an abundance of audio data remains unlabeled and unexploited on users' devices. Most existing federated learning approaches focus on supervised learning without harnessing the unlabeled data. In this work, we study the problem of semi-supervised learning of audio models via self-training in conjunction with federated learning. We propose FedSTAR to exploit large-scale on-device unlabeled data to improve the generalization of audio recognition models. We further demonstrate that self-supervised pre-trained models can accelerate the training of on-device models, significantly improving convergence to within fewer training rounds. We conduct experiments on diverse public audio classification datasets and investigate the performance of our models under varying percentages of labeled and unlabeled data. Notably, we show that with as little as 3% labeled data available, FedSTAR on average can improve the recognition rate by 13.28% compared to the fully supervised federated model.
翻译:联邦学习是一个分布式的机器学习模式,涉及分散和个人数据集。由于数据存在于智能手机和虚拟助理等设备上,因此标签委托给客户,或以自动方式提取标签。具体地说,在音频数据方面,获取语义说明可能极其昂贵和费时。结果,大量的音频数据仍然没有标签,用户设备上也没有开发。大多数现有的联邦学习方法侧重于监督学习,而没有使用未贴标签的数据。在这项工作中,我们研究通过与联邦学习相结合的自我培训,半监督地学习音频模型的问题。我们建议美联储利用大规模非标签数据,改进音频识别模型的通用化。我们进一步证明,自我监督的预先培训模式可以加快对安装模型的培训,大大改进与较少的培训周期的趋同。我们进行了多种公共音频分类数据集的实验,并用不同比例的标签和未贴标签的数据来调查我们模型的绩效。我们建议美联储利用大规模非标签数据来利用大规模非标签的无标签数据来改进音义识别模式的通用。我们用13.28%来将最新数据作为监督的标签化数据识别率,我们可以完全改进了。