In acoustic signal processing, the target signals usually carry semantic information, which is encoded in a hierarchal structure of short and long-term contexts. However, the background noise distorts these structures in a nonuniform way. The existing deep acoustic signal enhancement (ASE) architectures ignore this kind of local and global effect. To address this problem, we propose to integrate a novel temporal attentive-pooling (TAP) mechanism into a conventional convolutional recurrent neural network, termed as TAP-CRNN. The proposed approach considers both global and local attention for ASE tasks. Specifically, we first utilize a convolutional layer to extract local information of the acoustic signals and then a recurrent neural network (RNN) architecture is used to characterize temporal contextual information. Second, we exploit a novelattention mechanism to contextually process salient regions of the noisy signals. The proposed ASE system is evaluated using a benchmark infant cry dataset and compared with several well-known methods. It is shown that the TAPCRNN can more effectively reduce noise components from infant cry signals in unseen background noises at challenging signal-to-noise levels.
翻译:在声学信号处理过程中,目标信号通常包含语义信息,在短期和长期的等级结构中进行编码。然而,背景噪音以不统一的方式扭曲了这些结构。现有的深声信号增强(ASE)结构忽视了这种本地和全球效应。为了解决这一问题,我们提议将一个新的时间专注(TAP)机制纳入常规循环神经网络,称为TAP-CRNN。拟议方法既考虑全球也考虑地方对ASE任务的注意。具体地说,我们首先利用一个传动层来提取音频信号的当地信息,然后用一个经常性的神经网络(RNN)结构来描述时间背景信息。第二,我们利用一种新颖的感应机制来根据具体情况处理噪音信号的突出区域。拟议的ASE系统使用一个基准婴儿冷却数据集,并与一些众所周知的方法进行比较。它表明,TAPCRNNN可以更有效地减少在具有挑战性信号到感应水平的未知背景噪音中来自婴儿口号信号的噪音的噪音成分。