In this paper, we present a blockwise optimization method for masking-based networks (BLOOM-Net) for training scalable speech enhancement networks. Here, we design our network with a residual learning scheme and train the internal separator blocks sequentially to obtain a scalable masking-based deep neural network for speech enhancement. Its scalability lets it dynamically adjust the run-time complexity depending on the test time environment. To this end, we modularize our models in that they can flexibly accommodate varying needs for enhancement performance and constraints on the resources, incurring minimal memory or training overhead due to the added scalability. Our experiments on speech enhancement demonstrate that the proposed blockwise optimization method achieves the desired scalability with only a slight performance degradation compared to corresponding models trained end-to-end.
翻译:在本文中,我们为基于遮蔽的网络(BLOOM-Net)提出了一个用于培训可缩放语音增强网络的块状优化方法(BLOOM-Net ) 。 在这里,我们设计了带有剩余学习办法的网络,并依次培训内部分隔器块,以便获得一个基于可缩放遮蔽的深神经网络来增强语音。它的缩放性使得它能够根据测试时间环境动态调整运行时的复杂程度。 为此,我们将我们的模型组合成一个模块,以便它们能够灵活地满足提高性能的不同需要和对资源的限制,并由于增加的缩放性而产生最小的记忆或培训间接费用。 我们关于增强语音的实验表明,与经过培训的终端到终端的相应模型相比,拟议的块状优化方法只能实现预期的缩放性,只有稍微的性能退化。