沙沙沙玻璃:一个用于时间-域话语分离的小型多语种自觉网络 (Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation)

One of the leading single-channel speech separation (SS) models is based on a TasNet with a dual-path segmentation technique, where the size of each segment remains unchanged throughout all layers. In contrast, our key finding is that multi-granularity features are essential for enhancing contextual modeling and computational efficiency. We introduce a self-attentive network with a novel sandglass-shape, namely Sandglasset, which advances the state-of-the-art (SOTA) SS performance at significantly smaller model size and computational cost. Forward along each block inside Sandglasset, the temporal granularity of the features gradually becomes coarser until reaching half of the network blocks, and then successively turns finer towards the raw signal level. We also unfold that residual connections between features with the same granularity are critical for preserving information after passing through the bottleneck layer. Experiments show our Sandglasset with only 2.3M parameters has achieved the best results on two benchmark SS datasets -- WSJ0-2mix and WSJ0-3mix, where the SI-SNRi scores have been improved by absolute 0.8 dB and 2.4 dB, respectively, comparing to the prior SOTA results.

翻译：一种领先的单一通道语音分离模型(SS)基于一个具有双路分解技术的TasNet, 其每个区段的大小在所有层次上都保持不变。相反,我们的关键发现是,多色特性对于提高背景模型和计算效率至关重要。我们引入了一个带有一种新型沙玻璃形状的自我注意网络, 即沙沙沙玻璃, 沙沙沙玻璃, 它以小得多的模型大小和计算成本推进了最先进的SS(SOTA) 的性能。沿着沙沙沙沙里特的每个区块前进, 这些特性的时间颗粒逐渐变得粗糙,直到达到网络区块的一半,然后连续地将细微转向原始信号水平。我们还展示了同一颗粒特性之间的剩余连接对于在穿过瓶颈层后保存信息至关重要。实验显示我们只有2.3M参数的沙沙沙沙沙玻璃公司在SSS两个基准数据集上取得了最佳结果 -- WSJ0-2mix和WSJ0-3mix, 其中SI-SINRI的得分数已经分别通过绝对结果比SO0. 8 d2.4和DB分别改进了SON。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CIKM2020】持续域自适应的机器阅读理解，Continual Domain Adaptation

专知会员服务

12+阅读 · 2020年8月26日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日