In this paper, we present a scheme for extending deep neural network-based multiplicative maskers to deep subband filters for speech restoration in the time-frequency domain. The resulting method can be generically applied to any deep neural network providing masks in the time-frequency domain, while requiring only few more trainable parameters and a computational overhead that is negligible for state-of-the-art neural networks. We demonstrate that the resulting deep subband filtering scheme outperforms multiplicative masking for dereverberation, while leaving the denoising performance virtually the same. We argue that this is because deep subband filtering in the time-frequency domain fits the subband approximation often assumed in the dereverberation literature, whereas multiplicative masking corresponds to the narrowband approximation generally employed in denoising.
翻译:在本文中,我们提出了一个将深神经网络的多复制式掩码器扩大到用于时间频域内语音恢复的深子带过滤器的计划。 由此得出的方法可以笼统地适用于在时间频域内提供掩码的深神经网络,同时只需要少许更多的可训练参数和计算间接费用,而对于最先进的神经网络来说,这种计算间接费用是微不足道的。 我们证明,由此形成的深子带过滤器计划比起多复制式掩码来进行脱色,而使去色性能几乎保持不变。 我们认为,这是因为在时间频域内进行的深子带过滤符合在脱色文献中经常假定的亚频带近似值,而多复制式掩码与通常用于拆色的窄带近似值相对应。</s>