We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients. We simulate a real-world scenario where each client only has access to a few noisy recordings from a limited and disjoint number of speakers (hence non-IID). Each client trains their model in isolation using mixture invariant training while periodically providing updates to a central server. Our experiments show that our approach achieves competitive enhancement performance compared to IID training on a single device and that we can further facilitate the convergence speed and the overall performance using transfer learning on the server-side. Moreover, we show that we can effectively combine updates from clients trained locally with supervised and unsupervised losses. We also release a new dataset LibriFSD50K and its creation recipe in order to facilitate FL research for source separation problems.
翻译:我们建议FEDENHANCE, 一种不受监督的联邦学习(FL)方法,用于语言增强和与多种客户之间非IID分布的数据分离。我们模拟一种现实世界情景,即每个客户只能从数量有限和不相连的发言者那里获得一些噪音的录音(因此不是IID)。每个客户使用混合的变换培训,单独地培训自己的模型,同时定期向中央服务器提供最新消息。我们的实验表明,我们的方法与单一设备上IID培训相比,实现了竞争性的增强性能,我们还可以利用服务器的传输学习进一步加快聚合速度和总体性能。此外,我们表明,我们可以有效地将当地培训的客户提供的最新情况与受监督和不受监督的损失结合起来。我们还发布了一个新的数据集LibriFSD50K及其创建配方,以促进FL研究源分离问题。