Transformers have enabled major improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we have proposed SepFormer, which uses self-attention and obtains state-of-the art results on WSJ0-2/3 Mix datasets for speech separation. In this paper, we extend our previous work by providing results on more datasets including LibriMix, and WHAM!, WHAMR! which include noisy and noisy-reverberant conditions. Moreover we provide denoising, and denoising+dereverberation results in the context of speech enhancement, respectively on WHAM! and WHAMR! datasets. We also investigate incorporating recently proposed efficient self-attention mechanisms inside the SepFormer model, and show that by using efficient self-attention mechanisms it is possible to reduce the memory requirements significantly while performing better than the popular convtasnet model on WSJ0-2Mix dataset.
翻译:在深层学习方面, 变异器能够带来重大改进 。 它们往往在利用平行处理的同时, 在许多任务中优于常态和进化模式 。 最近, 我们提出了 SepFormer, 它使用自我关注, 并在 WSJ0-2/3 Mix 数据集中获取最新的语音分离艺术结果 。 在本文中, 我们扩展了我们以前的工作, 通过提供包括 LibriMix 和 WHWAM 在内的更多数据集的结果! WHAM! WHAM!! 。 包括噪音和噪音反动条件 。 此外, 我们还在语音增强方面提供了除色和去异变结果, 分别针对 WHHAM 和 WHAMR! 数据集 。 我们还调查了最近在SepFormer 模型中建议的有效自留机制, 并表明通过使用高效的自留机制, 能够大大降低记忆要求, 同时比 WSWJ0-2 Mix 数据集上流行的 convtasnet 模型更好地运行 。