学习用于音频分类的分光时间分辨率 (Learning the Spectrogram Temporal Resolution for Audio Classification) - 专知论文

会员服务 ·

0

Learning · 模型评估 · 相同 · state-of-the-art · Performer ·

2022 年 10 月 5 日

Learning the Spectrogram Temporal Resolution for Audio Classification

翻译：学习用于音频分类的分光时间分辨率

Haohe Liu,Xubo Liu,Qiuqiang Kong,Wenwu Wang,Mark D. Plumbley

from arxiv, Under review. Code open-sourced at https://github.com/haoheliu/diffres-python

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. The temporal resolution of a spectrogram depends on hop size. Previous works generally assume the hop size should be a constant value such as ten milliseconds. However, a fixed hop size or resolution is not always optimal for different types of sound. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution learning to improve the performance of audio classification models. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier, and can be end-to-end optimized. We evaluate DiffRes on the mel-spectrogram, followed by state-of-the-art classifier backbones, and apply it to five different subtasks. Compared with using the fixed-resolution mel-spectrogram, the DiffRes-based method can achieve the same or better classification accuracy with at least 25% fewer temporal dimensions on the feature level, which alleviates the computational cost at the same time. Starting from a high-temporal-resolution spectrogram such as one-millisecond hop size, we show that DiffRes can improve classification accuracy with the same computational complexity.

翻译：音频光谱是一种时间- 频率代表, 已被广泛用于音频分类。光谱的瞬时分辨率取决于跳体大小。以前的作品通常假定跳体大小应该是一个常数值, 如 10 毫秒。但是, 固定跳体大小或分辨率并不总是对不同类型的声音最合适。本文提出了一个新颖的方法, DiffRes, 它可以让不同的时间分辨率学习来改善音频分类模型的性能。与使用固定的浮体尺寸计算的光谱比较, DiffRes 合并了非必要的时间框架, 同时保存重要的框架。 DiffRes 在音频光谱和分类器之间, 通常假定跳体大小是一个常数值的“ 落入” 模块, 并且可以优化端到端。我们评估Mel- 相光谱的 DiffRes, 后继以状态的分类脊椎来应用它来改进音频分类模式的性能。与使用固定分辨率的光谱谱组合相比, DiffRes 基方法可以实现相同或更好的分类精确度, 在地貌水平上至少减少25%的时段尺寸,, 以高分辨率计算方式显示高分辨率的分辨率, 。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

面向CFRP构件的复合式电阻抗成像无损检测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Airy光纤及其光学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

基于多尺度leaders多重分形与多尺度约束PCA的汽车起重机主泵特征提取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

子流形曲率流及相关问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

双曲几何流

国家自然科学基金

0+阅读 · 2009年12月31日

水稻OsCAS（Calcium-sensing Receptor）基因的功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

仿射流形上的非线性分析

国家自然科学基金

0+阅读 · 2008年12月31日

Soft Augmentation for Image Classification

Arxiv

0+阅读 · 2022年11月9日

Novel Muscle Monitoring by Radiomyography(RMG) and Application to Hand Gesture Recognition

Novel Muscle Monitoring by Radiomyography(RMG) and Application to Hand Gesture Recognition

Arxiv

0+阅读 · 2022年11月7日

Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block

Arxiv

0+阅读 · 2022年11月5日

Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Arxiv

1+阅读 · 2022年11月4日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多域空战指挥体系：驾驭复杂性的艺术》

构建军事人工智能信任体系始于破除黑盒机制

《生态建模密码破译：建模与编程实践》美陆军最新报告

《战争形态演变：合成兵种防御主导模式探析》48页slides

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Soft Augmentation for Image Classification

Arxiv

0+阅读 · 2022年11月9日

Novel Muscle Monitoring by Radiomyography(RMG) and Application to Hand Gesture Recognition

Novel Muscle Monitoring by Radiomyography(RMG) and Application to Hand Gesture Recognition

Arxiv

0+阅读 · 2022年11月7日

Effective Audio Classification Network Based on Paired Inverse Pyramid Structure and Dense MLP Block

Arxiv

0+阅读 · 2022年11月5日

Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Arxiv

1+阅读 · 2022年11月4日

A Survey on Data Augmentation for Text Classification

A Survey on Data Augmentation for Text Classification

Arxiv

16+阅读 · 2021年7月7日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

Order-Free RNN with Visual Attention for Multi-Label Classification

Arxiv

16+阅读 · 2017年12月20日

相关基金

面向CFRP构件的复合式电阻抗成像无损检测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Airy光纤及其光学特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

几何结构形变空间的几何拓扑

国家自然科学基金

0+阅读 · 2012年12月31日

基于多尺度leaders多重分形与多尺度约束PCA的汽车起重机主泵特征提取方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

子流形曲率流及相关问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

双曲几何流

国家自然科学基金

0+阅读 · 2009年12月31日

水稻OsCAS（Calcium-sensing Receptor）基因的功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

仿射流形上的非线性分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员