音频代表自导发言模式的能力 (The Ability of Self-Supervised Speech Models for Audio Representations) - 专知论文

会员服务 ·

0

SSL · MoDELS · 表示 · state-of-the-art · Learning ·

2022 年 9 月 26 日

The Ability of Self-Supervised Speech Models for Audio Representations

翻译：音频代表自导发言模式的能力

Tung-Yu Wu,Chen-An Li,Tzu-Han Lin,Tsu-Yuan Hsu,Hung-Yi Lee

from arxiv, 18 pages, 2 figures, 6 tables. Submitted to Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track. Under review

Self-supervised learning (SSL) speech models have achieved unprecedented success in speech representation learning, but some questions regarding their representation ability remain unanswered. This paper addresses two of them: (1) Can SSL speech models deal with non-speech audio?; (2) Would different SSL speech models have insights into diverse aspects of audio features? To answer the two questions, we conduct extensive experiments on abundant speech and non-speech audio datasets to evaluate the representation ability of currently state-of-the-art SSL speech models, which are wav2vec 2.0 and HuBERT in this paper. These experiments are carried out during NeurIPS 2021 HEAR Challenge as a standard evaluation pipeline provided by competition officials. Results show that (1) SSL speech models could extract meaningful features of a wide range of non-speech audio, while they may also fail on certain types of datasets; (2) different SSL speech models have insights into different aspects of audio features. The two conclusions provide a foundation for the ensemble of representation models. We further propose an ensemble framework to fuse speech representation models' embeddings. Our framework outperforms state-of-the-art SSL speech/audio models and has generally superior performance on abundant datasets compared with other teams in HEAR Challenge. Our code is available at https://github.com/tony10101105/HEAR-2021-NeurIPS-Challenge -- NTU-GURA.

翻译：自我监督的学习(SSL)语言模型在语言代表学习方面取得了前所未有的成功,但有关其代表性能力的一些问题仍然没有得到回答。本文件针对其中两个实验:(1) SSL语言模型能否与非语音音频打交道?(2) 不同的SSL语言模型能否对音频特征的不同方面有洞察?为了回答这两个问题,我们对丰富的语音和非语音音频数据集进行了广泛的实验,以评价目前最先进的SSL语言模型的代表性能力,即本文中的 wav2vec 2.0和HuBERT。这些实验是在NeurIPS 2021 Emall Challenge作为竞争官员提供的标准评价管道进行的。结果显示:(1) SSL语言模型能够从广泛的非语音音频中提取有意义的特征,而它们也可能在某些类型的数据集上失败;(2) 不同的SSL语言模型对音频特征的不同方面有洞察。这两项结论为全套的演示模型提供了基础。我们进一步提议一个包含语音代表模型的游戏框架,作为NurIPSERC-Challen嵌入式的NUR-101 Embreau-SLAS-SLA comstal Stabial Stabial Stabits-SLADLADSLA

0

相关内容

SSL

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

中国淡水桥弯藻（Cymbelloid）植物分类学研究

国家自然科学基金

1+阅读 · 2014年12月31日

C3a/C3aR轴过度活化在糖尿病肾病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

再生核希尔伯特空间图像稀疏表达算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

细胞外基质仿生的脂质体/凝胶复合型基因活化支架的构建与评价

国家自然科学基金

0+阅读 · 2013年12月31日

LAMOST恒星光谱物理参数测量软件系统的开发

国家自然科学基金

0+阅读 · 2013年12月31日

一类不可微分布鲁棒最优控制问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于FRET原理的高灵敏度DNA纳米荧光探针的合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

脂联素拮抗瘦素致血管细胞外基质（ECM）重构的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicroRNA-21在高血压心肌重塑中的作用及调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild

RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild

Arxiv

0+阅读 · 2022年11月2日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings

Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings

Arxiv

0+阅读 · 2022年10月31日

Visual Speech Recognition for Multiple Languages in the Wild

Arxiv

0+阅读 · 2022年10月30日

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Arxiv

0+阅读 · 2022年10月29日

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Arxiv

0+阅读 · 2022年10月29日

Evaluating context-invariance in unsupervised speech representations

Arxiv

0+阅读 · 2022年10月27日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild

RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild

Arxiv

0+阅读 · 2022年11月2日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings

Audio-Visual Speech Enhancement and Separation by Leveraging Multi-Modal Self-Supervised Embeddings

Arxiv

0+阅读 · 2022年10月31日

Visual Speech Recognition for Multiple Languages in the Wild

Arxiv

0+阅读 · 2022年10月30日

Application of Knowledge Distillation to Multi-task Speech Representation Learning

Arxiv

0+阅读 · 2022年10月29日

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Arxiv

0+阅读 · 2022年10月29日

Evaluating context-invariance in unsupervised speech representations

Arxiv

0+阅读 · 2022年10月27日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

相关基金

中国淡水桥弯藻（Cymbelloid）植物分类学研究

国家自然科学基金

1+阅读 · 2014年12月31日

C3a/C3aR轴过度活化在糖尿病肾病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

再生核希尔伯特空间图像稀疏表达算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

细胞外基质仿生的脂质体/凝胶复合型基因活化支架的构建与评价

国家自然科学基金

0+阅读 · 2013年12月31日

LAMOST恒星光谱物理参数测量软件系统的开发

国家自然科学基金

0+阅读 · 2013年12月31日

一类不可微分布鲁棒最优控制问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于FRET原理的高灵敏度DNA纳米荧光探针的合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

脂联素拮抗瘦素致血管细胞外基质（ECM）重构的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicroRNA-21在高血压心肌重塑中的作用及调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员