Although today's speech communication systems support various bandwidths from narrowband to super-wideband and beyond, state-of-the art DNN methods for acoustic echo cancellation (AEC) are lacking modularity and bandwidth scalability. Our proposed DNN model builds upon a fully convolutional recurrent network (FCRN) and introduces scalability over various bandwidths up to a fullband (FB) system (48 kHz sampling rate). This modular approach allows joint wideband (WB) pre-training of mask-based AEC and postfilter stages with dedicated losses, followed by a separate training of them on FB data. A third lightweight blind bandwidth extension stage is separately trained on FB data, flexibly allowing to extend the WB postfilter output towards higher bandwidths until reaching FB. Thereby, higher frequency noise and echo are reliably suppressed. On the ICASSP 2022 Acoustic Echo Cancellation Challenge blind test set we report a competitive performance, showing robustness even under highly delayed echo and dynamic echo path changes.
翻译:虽然今天的语音通信系统支持了从窄带到超宽带和其他地方的各种带宽,但最先进的DNN取消声波的最新DN方法缺乏模块性和带宽可扩缩性。我们提议的DNN模式建立在完全连动的经常性网络(FCRN)上,并引入了向全带(FB)系统(48千赫兹取样率)以至全带(48千赫兹取样率)的各种带宽的可扩缩性。这种模块化方法允许对面罩AEC和后过滤级进行带有专门损失的联合宽带预培训,随后对他们进行关于FB数据的单独培训。第三个轻度盲带宽扩展阶段是就FB数据单独培训的,灵活地允许将WB后过滤器的输出扩大到更高的带宽,直到到达FB。因此,更频繁的噪音和回声被可靠地抑制。在ICSSP 2022 声频取消挑战盲测试中,我们报告了一个竞争性的性表现,即使在高度延迟的回音和动态回声路径变化的情况下也表现出强性。