Time-domain Transformer neural networks have proven their superiority in speech separation tasks. However, these models usually have a large number of network parameters, thus often encountering the problem of GPU memory explosion. In this paper, we proposed Tiny-Sepformer, a tiny version of Transformer network for speech separation. We present two techniques to reduce the model parameters and memory consumption: (1) Convolution-Attention (CA) block, spliting the vanilla Transformer to two paths, multi-head attention and 1D depthwise separable convolution, (2) parameter sharing, sharing the layer parameters within the CA block. In our experiments, Tiny-Sepformer could greatly reduce the model size, and achieves comparable separation performance with vanilla Sepformer on WSJ0-2/3Mix datasets.
翻译:时间磁场变形神经网络在语音分离任务中已经证明了它们具有优势。 但是,这些模型通常具有大量的网络参数,因此常常遇到 GPU 内存爆炸问题。 在本文中,我们建议使用微小版本的变形器网络进行语音分离。 我们提出了两种技术来减少模型参数和记忆消耗:(1) 革命- 注意(CA)块,将香草变形器分成两条路径,多头目关注和1D 深度分离组合,(2) 参数共享,共享 CA区内的层参数。 在我们的实验中, Tiny- Sepex 能够大大缩小模型规模,并实现与Vanilla Sepex在 WSJ0-2/3Mix 数据集上的类似分离性能。