学习型内容自适应时频表示法-用于音频信号处理 (A Content Adaptive Learnable Time-Frequency Representation For Audio Signal Processing)

We propose a learnable content adaptive front end for audio signal processing. Before the modern advent of deep learning, we used fixed representation non-learnable front-ends like spectrogram or mel-spectrogram with/without neural architectures. With convolutional architectures supporting various applications such as ASR and acoustic scene understanding, a shift to a learnable front ends occurred in which both the type of basis functions and the weight were learned from scratch and optimized for the particular task of interest. With the shift to transformer-based architectures with no convolutional blocks present, a linear layer projects small waveform patches onto a small latent dimension before feeding them to a transformer architecture. In this work, we propose a way of computing a content-adaptive learnable time-frequency representation. We pass each audio signal through a bank of convolutional filters, each giving a fixed-dimensional vector. It is akin to learning a bank of finite impulse-response filterbanks and passing the input signal through the optimum filter bank depending on the content of the input signal. A content-adaptive learnable time-frequency representation may be more broadly applicable, beyond the experiments in this paper.

翻译：我们提出了一种学习型内容自适应前端，用于音频信号处理。在深度学习盛行之前，我们使用了不可学习的固定表示法前端，如谱图或带/不带神经架构的梅尔谱图。卷积神经网络支持各种应用，如ASR和声场理解，它们开始采用可学习的前端，既从头开始学习基函数类型，又从头开始学习权重，并针对特定的任务进行优化。随着基于转换器的架构的到来，不再存在卷积块，线性层将小的波形补丁投影到小的潜在维度中，然后将它们送入转换器架构。在这项工作中，我们提出了一种计算内容自适应的可学习时频表示法的方法。我们将每个音频信号通过一组卷积滤波器，每个滤波器都给出一个固定维度的向量。这类似于学习一组有限脉冲响应滤波器组并根据输入信号的内容将其通过最佳滤波器组。内容自适应的学习型时频表示法可能具有更广泛的适用性，超越了本文的实验。

相关内容

Signal Processing

关注 3

信号处理期刊采用了理论与实践的各个方面的信号处理。它以原始研究工作，教程和评论文章以及实际发展情况为特色。它旨在将知识和经验快速传播给从事信号处理研究，开发或实际应用的工程师和科学家。该期刊涵盖的主题领域包括：信号理论；随机过程; 检测和估计；光谱分析；过滤；信号处理系统；软件开发；图像处理; 模式识别; 光信号处理；数字信号处理; 多维信号处理；通信信号处理；生物医学信号处理；地球物理和天体信号处理；地球资源信号处理；声音和振动信号处理；数据处理; 遥感; 信号处理技术；雷达信号处理；声纳信号处理；工业应用；新的应用程序。官网地址：http://dblp.uni-trier.de/db/journals/sigpro/

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

21+阅读 · 2020年4月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

30+阅读 · 2020年3月11日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

24+阅读 · 2020年2月28日