Federated learning (FL) is a privacy-preserving machine learning method that has been proposed to allow training of models using data from many different clients, without these clients having to transfer all their data to a central server. There has as yet been relatively little consideration of FL or other privacy-preserving methods in audio. In this paper, we investigate using FL for a sound event detection task using audio from the FSD50K dataset. Audio is split into clients based on uploader metadata. This results in highly imbalanced subsets of data between clients, noted as a key issue in FL scenarios. A series of models is trained using `high-volume' clients that contribute 100 audio clips or more, testing the effects of varying FL parameters, followed by an additional model trained using all clients with no minimum audio contribution. It is shown that FL models trained using the high-volume clients can perform similarly to a centrally-trained model, though there is much more noise in results than would typically be expected for a centrally-trained model. The FL model trained using all clients has a considerably reduced performance compared to the centrally-trained model.
翻译:联邦学习(FL)是一种保护隐私的机器学习方法,建议用于培训使用来自许多不同客户的数据的模型,而这些客户不必将其全部数据转移到中央服务器。迄今为止,对FL或其他隐私保护方法的音频考虑相对较少。在本文中,我们调查利用FSD50K数据集的音频为事件探测任务使用FL进行健康事件探测任务。音频根据上传元数据分为客户。这导致客户之间数据分类高度不平衡,这是FL情景中的一个关键问题。使用“大容量”客户提供100个音频剪或更多数据进行一系列培训,测试不同FL参数的影响,然后用没有最低音频贡献的所有客户进行额外培训。我们发现,使用高容量客户培训的FL模型可以与集中培训模式类似,尽管结果比一般对集中培训模式的预期要多得多。使用所有客户的FL模型的性能比集中培训模式要低得多。