Time-variant factors often occur in real-world full-duplex communication applications. Some of them are caused by the complex environment such as non-stationary environmental noises and varying acoustic path while some are caused by the communication system such as the dynamic delay between the far-end and near-end signals. Current end-to-end deep neural network (DNN) based methods usually model the time-variant components implicitly and can hardly handle the unpredictable time-variance in real-time speech enhancement. To explicitly capture the time-variant components, we propose a dynamic kernel generation (DKG) module that can be introduced as a learnable plug-in to a DNN-based end-to-end pipeline. Specifically, the DKG module generates a convolutional kernel regarding to each input audio frame, so that the DNN model is able to dynamically adjust its weights according to the input signal during inference. Experimental results verify that DKG module improves the performance of the model under time-variant scenarios, in the joint acoustic echo cancellation (AEC) and deep noise suppression (DNS) tasks.
翻译:时间差异因素往往出现在现实世界全翻的通信应用中,其中一些因素是由复杂的环境造成的,例如非静止环境噪音和不同的声波路径,而有些因素则是由通信系统造成的,例如远端信号和近端信号之间的动态延迟。目前的端到端深神经网络(DNN)依据的方法通常隐含地模拟时间差异组成部分,很难处理实时语音增强中不可预测的时间差异。为了明确捕捉时间差异部分,我们提议了动态内核生成模块,可以作为基于 DNNE 的端到端管道的可学习插件。具体地说,DKG 模块生成了每个输入音频框架的动态内核,因此DNN能够根据推断过程中输入信号动态调整其重量。实验结果证实DKG 模块在联合声响声取消和深噪声抑制任务中改进了时变模型的性能。</s>