Ultra-lightweight model design is an important topic for the deployment of existing speech enhancement and source separation techniques on low-resource platforms. Various lightweight model design paradigms have been proposed in recent years; however, most models still suffer from finding a balance between model size, model complexity, and model performance. In this paper, we propose the group communication with context codec (GC3) design to decrease both model size and complexity without sacrificing the model performance. Group communication splits a high-dimensional feature into groups of low-dimensional features and applies a module to capture the inter-group dependency. A model can then be applied to the groups in parallel with a significantly smaller width. A context codec is applied to decrease the length of a sequential feature, where a context encoder compresses the temporal context of local features into a single feature representing the global characteristics of the context, and a context decoder decompresses the transformed global features back to the context features. Experimental results show that GC3 can achieve on par or better performance than a wide range of baseline architectures with as small as 2.5% model size.
翻译:超光速模型设计是将现有语音增强和源分离技术应用于低资源平台的一个重要议题。近年来已经提出了各种轻量模型设计范例;然而,大多数模型仍然在模型大小、模型复杂性和模型性能之间找到平衡。在本文中,我们建议与背景代码(GC3)进行群体交流,以减少模型大小和复杂性,同时又不牺牲模型性能。群体通信将高维特征分为低维特征组,并应用模块来捕捉群体间依赖性。然后,可以同时在宽度小得多的情况下对群体适用一个模型。使用上下文代码来缩短连续特征的长度,在连续特征中,环境编码器将本地特征的时间环境组合成一个单一特征,代表环境的全局特征,环境解码器将变化的全球特征反压缩回到背景特征。实验结果显示,GC3可以比规模小至2.5%的广大基线结构取得相同或更好的性能。