超载语音分离和识别的混合精密度 DNN 校验 (Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition) - 专知论文

会员服务 ·

0

查准率/准确率 · 分离的 · DNN · UniFormer · 可约的 ·

2021 年 11 月 29 日

Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition

翻译：超载语音分离和识别的混合精密度 DNN 校验

Junhao Xu,Jianwei Yu,Xunying Liu,Helen Meng

Recognition of overlapped speech has been a highly challenging task to date. State-of-the-art multi-channel speech separation system are becoming increasingly complex and expensive for practical applications. To this end, low-bit neural network quantization provides a powerful solution to dramatically reduce their model size. However, current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different model components to quantization errors. In this paper, novel mixed precision DNN quantization methods are proposed by applying locally variable bit-widths to individual TCN components of a TF masking based multi-channel speech separation system. The optimal local precision settings are automatically learned using three techniques. The first two approaches utilize quantization sensitivity metrics based on either the mean square error (MSE) loss function curvature, or the KL-divergence measured between full precision and quantized separation models. The third approach is based on mixed precision neural architecture search. Experiments conducted on the LRS3-TED corpus simulated overlapped speech data suggest that the proposed mixed precision quantization techniques consistently outperform the uniform precision baseline speech separation systems of comparable bit-widths in terms of SI-SNR and PESQ scores as well as word error rate (WER) reductions up to 2.88% absolute (8% relative).

翻译：迄今为止,承认重叠言论是一项极具挑战性的任务。对于实际应用来说,最先进的多通道语音分离系统正在变得日益复杂和昂贵。为此,低位神经网络量化为大幅缩小模型大小提供了强有力的解决方案。然而,目前的量化方法基于统一精确度,没有考虑到不同模型组件不同性能敏感度对于量化错误的不同程度。在本文中,通过对基于多通道语音分离系统的TF掩码系统的各个TCN部件应用本地可变比特宽度,提出了新颖的精度精度达NNNN量化方法。最佳本地精确度设置是用三种技术自动学习的。前两种方法使用基于平均平方差(MSE)损失函数曲线的量化灵敏度指标,或者没有考虑到在完全精确度和四分解分离模型之间测量到的不同性能敏感度。在混合精度神经结构搜索的基础上,对基于基于多位性微分辨度的语音结构进行实验,显示拟议的混合精确度缩略度技术在SIRS-88的绝对精确度缩度缩度上,是SIER-RIS的精确度缩度缩度的精确度缩度缩度为2。

0

相关内容

查准率/准确率

查准率/准确率

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

CVer

4+阅读 · 2020年11月14日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

谷歌再获语音识别新进展：利用序列转导来实现多人语音识别和说话人分类

谷歌再获语音识别新进展：利用序列转导来实现多人语音识别和说话人分类

AI科技评论

7+阅读 · 2019年8月24日

人脸检测库：libfacedetection

人脸检测库：libfacedetection

Python程序员

15+阅读 · 2019年3月22日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

用 Intel MKL-DNN 加速 CPU 上的深度学习

用 Intel MKL-DNN 加速 CPU 上的深度学习

ApacheMXNet

4+阅读 · 2018年4月11日

学界 | 谷歌语音识别端到端系统单词错误率降至5.6%，较传统模型提升16%

学界 | 谷歌语音识别端到端系统单词错误率降至5.6%，较传统模型提升16%

AI科技评论

5+阅读 · 2017年12月16日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

Signal Quality Assessment of Photoplethysmogram Signals using Quantum Pattern Recognition and lightweight CNN Architecture

Arxiv

0+阅读 · 2022年2月1日

A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

Arxiv

0+阅读 · 2022年1月31日

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Arxiv

0+阅读 · 2022年1月31日

Integer-only Zero-shot Quantization for Efficient Speech Recognition

Arxiv

0+阅读 · 2022年1月30日

A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain

Arxiv

0+阅读 · 2022年1月28日

An application of Saddlepoint Approximation for period detection of stellar light observations

Arxiv

0+阅读 · 2022年1月27日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

基于粗粒度数据流架构的稀疏卷积神经网络加速

专知会员服务

23+阅读 · 2021年7月15日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

95 FPS！超快速3D目标检测网络开源了！SFA3D：基于LiDAR的实时、准确的3D目标检测模型

CVer

4+阅读 · 2020年11月14日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

谷歌再获语音识别新进展：利用序列转导来实现多人语音识别和说话人分类

谷歌再获语音识别新进展：利用序列转导来实现多人语音识别和说话人分类

AI科技评论

7+阅读 · 2019年8月24日

人脸检测库：libfacedetection

人脸检测库：libfacedetection

Python程序员

15+阅读 · 2019年3月22日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

用 Intel MKL-DNN 加速 CPU 上的深度学习

用 Intel MKL-DNN 加速 CPU 上的深度学习

ApacheMXNet

4+阅读 · 2018年4月11日

学界 | 谷歌语音识别端到端系统单词错误率降至5.6%，较传统模型提升16%

学界 | 谷歌语音识别端到端系统单词错误率降至5.6%，较传统模型提升16%

AI科技评论

5+阅读 · 2017年12月16日

【推荐】卷积神经网络类间不平衡问题系统研究

【推荐】卷积神经网络类间不平衡问题系统研究

机器学习研究会

6+阅读 · 2017年10月18日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Signal Quality Assessment of Photoplethysmogram Signals using Quantum Pattern Recognition and lightweight CNN Architecture

Arxiv

0+阅读 · 2022年2月1日

A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

Arxiv

0+阅读 · 2022年1月31日

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Arxiv

0+阅读 · 2022年1月31日

Integer-only Zero-shot Quantization for Efficient Speech Recognition

Arxiv

0+阅读 · 2022年1月30日

A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain

Arxiv

0+阅读 · 2022年1月28日

An application of Saddlepoint Approximation for period detection of stellar light observations

Arxiv

0+阅读 · 2022年1月27日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

微信扫码咨询专知VIP会员