In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously. More specifically, MOSA-Net is designed to estimate the speech quality, intelligibility, and distortion assessment scores of an input test speech signal. It comprises a convolutional neural network and bidirectional long short-term memory (CNN-BLSTM) architecture for representation extraction, and a multiplicative attention layer and a fully-connected layer for each assessment metric. In addition, cross-domain features (spectral and time-domain features) and latent representations from self-supervised learned models are used as inputs to combine rich acoustic information from different speech representations to obtain more accurate assessments. Experimental results show that MOSA-Net can precisely predict perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI) scores when tested on noisy and enhanced speech utterances under either seen test conditions or unseen test conditions. Moreover, MOSA-Net, originally trained to assess objective scores, can be used as a pre-trained model to be effectively adapted to an assessment model for predicting subjective quality and intelligibility scores with a limited amount of training data. In light of the confirmed prediction capability, we further adopt the latent representations of MOSA-Net to guide the speech enhancement (SE) process and derive a quality-intelligibility (QI)-aware SE (QIA-SE) approach accordingly. Experimental results show that QIA-SE provides superior enhancement performance compared with the baseline SE system in terms of objective evaluation metrics and qualitative evaluation test.


翻译:在本研究中,我们提出一个跨部多目标语言评估模型,称为MOSA-Net,可以同时估计多种语言评估指标;更具体地说,MOSA-Net旨在估计一个输入测试语音信号的语音质量、智能和扭曲评估分数;它包括一个用于代表提取的进化神经网络和双向长期短期记忆(CNN-BLSTM)结构,一个多复制关注层和每个评估指标的全连层;此外,跨部特征(光谱和时常识特征)和自我监督的学习模型的潜在表现被作为投入,用于将不同语音演示的丰富声学信息结合起来,以获得更准确的评估;实验结果表明,MOSA-Net可以准确地预测对语言质量的感知性评估(PESQ),短期目标不易见目标,语言扭曲指数(SDI),在通过测试条件或可见的测试条件下测试音响和强化的言词表达力;此外,MOSA-Net,最初经过培训的对客观评分数进行比较,在SESE-RI标准中,可以有效地将SE-Sealalalal-realalalalal-ass revial revial revial revial a a a devial devial devial deal devial deal deal deal deal a la a la a int a int a int a int a int a la a livialvial deal devial deal deal deal devial devial devial a la a int a int lamental devial devalvial deal devial devial deal deal deal deal deal deal deal deal deal deal a la a la a lamental deal deal deal deal deal deal deal deal deal deal a la lactional deal deal a lactional lactional deal deal deal deal deal deal deal deal deal deal a laction lament a la la la la la la la la la la la la ladal a int a int a int la la la

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/
100+篇《自监督学习(Self-Supervised Learning)》论文最新合集
专知会员服务
164+阅读 · 2020年3月18日
专知会员服务
116+阅读 · 2019年12月24日
Keras François Chollet 《Deep Learning with Python 》, 386页pdf
专知会员服务
152+阅读 · 2019年10月12日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
深度自进化聚类:Deep Self-Evolution Clustering
我爱读PAMI
15+阅读 · 2019年4月13日
Unsupervised Learning via Meta-Learning
CreateAMind
42+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
11+阅读 · 2019年1月2日
Hierarchical Disentangled Representations
CreateAMind
4+阅读 · 2018年4月15日
Auto-Encoding GAN
CreateAMind
7+阅读 · 2017年8月4日
Arxiv
5+阅读 · 2020年6月16日
Phase-aware Speech Enhancement with Deep Complex U-Net
Arxiv
8+阅读 · 2018年11月27日
Arxiv
15+阅读 · 2018年2月4日
VIP会员
Top
微信扫码咨询专知VIP会员