In this paper, we present a blockwise optimization method for masking-based networks (BLOOM-Net) for training scalable speech enhancement networks. Here, we design our network with a residual learning scheme and train the internal separator blocks sequentially to obtain a scalable masking-based deep neural network for speech enhancement. Its scalability lets it adjust the run-time complexity based on the test-time resource constraints: once deployed, the model can alter its complexity dynamically depending on the test time environment. To this end, we modularize our models in that they can flexibly accommodate varying needs for enhancement performance and constraints on the resources, incurring minimal memory or training overhead due to the added scalability. Our experiments on speech enhancement demonstrate that the proposed blockwise optimization method achieves the desired scalability with only a slight performance degradation compared to corresponding models trained end-to-end.
翻译:在本文中,我们为基于掩蔽的网络(BLOOM-Net)提出了一个用于培训可缩放语音增强网络的块状优化方法(BLOOM-Net ) 。 在这里,我们用一个剩余学习计划来设计我们的网络,并依次培训内部分隔器块,以便获得一个可缩放的基于遮蔽的深神经网络来增强语音。它的缩放性允许它根据测试时间资源的限制来调整运行时间的复杂性:一旦部署,模型可以根据测试时间环境动态地改变其复杂性。 为此,我们将模型模块化,以便它们灵活地满足对提高性能的不同需要和资源的限制,由于增加的可缩放性而导致最小的记忆或培训间接费用。 我们在增强语音方面的实验表明,拟议的块状优化方法能够达到预期的缩放性,与经过培训的终端到终端的相应模型相比,只有轻微的性能退化。