Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead.
翻译:为了纠正这一问题,我们建议从非当地注意力中调整一个自我注意层,加上使用原始信号输入的语音增强GAN(SEGAN)的进化和分变层。此外,我们从经验上研究将自我注意层置于(d)进化层,具有不同层次指数,并在记忆允许的情况下将其全部置于(d)进化层的影响。我们的实验表明,对SEGAN的自我注意可以使提高性能的客观评价指标得到一致改进。此外,在不同(de)进化层应用不会显著改变性能,表明在最高层(de)进化层和最小的记忆管理下,可以方便地将其应用到最高层(de)进化层。