LIMSE: 轻量级多式议长 (LiMuSE: Lightweight Multi-modal Speaker Extraction)

The past several years have witnessed significant progress in modeling the Cocktail Party Problem in terms of speech separation and speaker extraction. In recent years, multi-modal cues, including spatial information, facial expression and voiceprint, are introduced to speaker extraction task to serve as complementary information to each other to achieve better performance. However, the front-end model, for speaker extraction, become large and hard to deploy on a resource-constrained device. In this paper, we address the aforementioned problem with novel model architectures and model compression techniques, and propose a lightweight multi-modal framework for speaker extraction (dubbed LiMuSE), which adopts group communication (GC) to split multi-modal high-dimension features into groups of low-dimension features with smaller width which could be run in parallel, and further uses an ultra-low bit quantization strategy to achieve lower model size. The experiments on the GRID dataset show that incorporating GC into the multi-modal framework achieves on par or better performance with 24.86 times fewer parameters, and applying the quantization strategy to the GC-equipped model further obtains about 9 times compression ratio while maintaining a comparable performance compared with baselines. Our code will be available at https://github.com/aispeech-lab/LiMuSE.

翻译：在过去几年里,在制作鸡尾酒党问题的语音分隔和语音提取模型方面取得了显著进展;近年来,在语音提取任务中引入了多种模式提示,包括空间信息、面部表达和语音打印,作为相互补充的信息,以达到更好的性能;然而,前端模式,即语音提取,变得巨大和难以在资源受限制的装置上部署;在本文件中,我们用新型模型结构和压缩模型技术来解决上述问题,并提议为发言者提取提供一个轻巧的多式框架(低频LIMOSE),采用群体通信(GC),将多模式高分解高分解功能,分为低层特征群体,其宽度较小,可平行运行,并进一步使用超低位四分化战略,以降低模型大小。全球资源数据库数据集实验显示,将GC纳入多模式框架,以比重或更好的性能实现比重为24.86倍的参数,并将分解战略应用于GC设备化模型,将组合通信(GC)将多模式分为大约9倍的压缩比例,同时保持可平行运行的MUL/SE的基线。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/