This work proposes a new learning target based on reverberation time shortening (RTS) for speech dereverberation. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training. The proposed RTS target suppresses reverberation and meanwhile maintains the exponential decaying property of reverberation, which will ease the network training, and thus reduce signal distortion caused by the prediction error. Moreover, this work experimentally study to adapt our previously proposed FullSubNet speech denoising network to speech dereverberation. Experiments show that RTS is a more suitable learning target than direct-path speech and early reflections, in terms of better suppressing reverberation and signal distortion. FullSubNet is able to achieve outstanding dereverberation performance.
翻译:这项工作提出了一个新的学习目标, 其依据是语音偏差的回转时间缩短( RTS) 。 皮肤偏差的学习目标通常被设定为直接路话, 或者有早期反射的选项。 这种类型的目标突然缩短回转, 因而可能不适合网络培训 。 拟议的 RTS 目标抑制回转, 并同时保持回转的指数衰变属性, 这将方便网络培训, 从而减少预测错误造成的信号扭曲 。 此外, 此项实验性研究将我们先前提议的 FullSubNet 语音偏移网络调整为语音偏移。 实验显示, 从更好地抑制回转和信号扭曲的角度来看, RTS 是比直接路话和早期反射更合适的学习目标 。 全SubNet 能够实现杰出的 devererversation 性表现 。