通过可培训基础功能了解音频特点 (Understanding Audio Features via Trainable Basis Functions) - 专知论文

会员服务 ·

0

泛函 · 模型复杂度 · 可理解性 · Boosting（一种模型训练加速方式） · MoDELS ·

2022 年 4 月 25 日

Understanding Audio Features via Trainable Basis Functions

翻译：通过可培训基础功能了解音频特点

Kwan Yee Heung,Kin Wai Cheuk,Dorien Herremans

from arxiv, under review in Interspeech 2022

In this paper we explore the possibility of maximizing the information represented in spectrograms by making the spectrogram basis functions trainable. We experiment with two different tasks, namely keyword spotting (KWS) and automatic speech recognition (ASR). For most neural network models, the architecture and hyperparameters are typically fine-tuned and optimized in experiments. Input features, however, are often treated as fixed. In the case of audio, signals can be mainly expressed in two main ways: raw waveforms (time-domain) or spectrograms (time-frequency-domain). In addition, different spectrogram types are often used and tailored to fit different applications. In our experiments, we allow for this tailoring directly as part of the network. Our experimental results show that using trainable basis functions can boost the accuracy of Keyword Spotting (KWS) by 14.2 percentage points, and lower the Phone Error Rate (PER) by 9.5 percentage points. Although models using trainable basis functions become less effective as the model complexity increases, the trained filter shapes could still provide us with insights on which frequency bins are important for that specific task. From our experiments, we can conclude that trainable basis functions are a useful tool to boost the performance when the model complexity is limited.

翻译：在本文中,我们探索了通过使光谱基础功能可以培训而最大限度地增加光谱图中所含信息的可能性。我们实验了两种不同的任务, 即关键识别( KWS) 和自动语音识别( ASR ) 。对于大多数神经网络模型来说, 建筑和超参数通常是在实验中进行微调和优化的。但是, 输入功能通常被视为固定的。在音频方面, 信号主要可以以两种主要方式表达: 原始波形( 时间- 数据) 或光谱( 时间- 频率- 数据) 。此外, 不同的光谱类型经常被使用和定制以适应不同的应用程序。在我们的实验中, 我们允许直接将这种设计作为网络的一部分。我们的实验结果显示, 使用可训练的基础功能可以提高关键词点( KWS) 的精确度14.2 个百分点, 并将电话错误率降低9.5个百分点。尽管使用可训练基础功能的模型随着模型复杂性的提高而变得不那么有效, 受过训练的过滤器形状仍然可以使我们洞察到哪个频率箱对具体任务来说是重要的。从我们的工作复杂性判断, 。从有限的实验可以得出一个有限的推力基础。

0

相关内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

ATM基因在糖尿病大血管病变防治中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

ADS中检测快中子束的GEM探测器的研制

国家自然科学基金

0+阅读 · 2013年12月31日

针刺对脑缺血星形胶质细胞-神经元能量偶联的调节机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

HDPR1-δ-catenin通路在非小细胞肺癌侵袭和凋亡中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

人B7-H3受体的分子鉴定及其协同刺激作用的研究

国家自然科学基金

0+阅读 · 2009年12月31日

糖尿病视网膜病变阳虚病机的代谢组学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Lightweight Conditional Model Extrapolation for Streaming Data under Class-Prior Shift

Arxiv

0+阅读 · 2022年6月10日

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

Arxiv

0+阅读 · 2022年6月9日

Smoothness and effective regularizations in learned embeddings for shape matching

Arxiv

0+阅读 · 2022年6月8日

Gradient Obfuscation Gives a False Sense of Security in Federated Learning

Arxiv

0+阅读 · 2022年6月8日

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

Arxiv

0+阅读 · 2022年6月7日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

VIP会员

文章信息

相关主题

模型复杂度

Boosting（一种模型训练加速方式）

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Lightweight Conditional Model Extrapolation for Streaming Data under Class-Prior Shift

Arxiv

0+阅读 · 2022年6月10日

Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

Arxiv

0+阅读 · 2022年6月9日

Smoothness and effective regularizations in learned embeddings for shape matching

Arxiv

0+阅读 · 2022年6月8日

Gradient Obfuscation Gives a False Sense of Security in Federated Learning

Arxiv

0+阅读 · 2022年6月8日

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

Arxiv

0+阅读 · 2022年6月7日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

The Principles of Deep Learning Theory

Arxiv

65+阅读 · 2021年6月18日

Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition

Arxiv

15+阅读 · 2021年4月12日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

Zero-Shot Transfer Learning for Event Extraction

Arxiv

10+阅读 · 2017年7月4日

相关基金

ATM基因在糖尿病大血管病变防治中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

ADS中检测快中子束的GEM探测器的研制

国家自然科学基金

0+阅读 · 2013年12月31日

针刺对脑缺血星形胶质细胞-神经元能量偶联的调节机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

HDPR1-δ-catenin通路在非小细胞肺癌侵袭和凋亡中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

人B7-H3受体的分子鉴定及其协同刺激作用的研究

国家自然科学基金

0+阅读 · 2009年12月31日

糖尿病视网膜病变阳虚病机的代谢组学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员