Pattern recognition from audio signals is an active research topic encompassing audio tagging, acoustic scene classification, music classification, and other areas. Spectrogram and mel-frequency cepstral coefficients (MFCC) are among the most commonly used features for audio signal analysis and classification. Recently, deep convolutional neural networks (CNN) have been successfully used for audio classification problems using spectrogram-based 2D features. In this paper, we present SpectNet, an integrated front-end layer that extracts spectrogram features within a CNN architecture that can be used for audio pattern recognition tasks. The front-end layer utilizes learnable gammatone filters that are initialized using mel-scale filters. The proposed layer outputs a 2D spectrogram image which can be fed into a 2D CNN for classification. The parameters of the entire network, including the front-end filterbank, can be updated via back-propagation. This training scheme allows for fine-tuning the spectrogram-image features according to the target audio dataset. The proposed method is evaluated in two different audio signal classification tasks: heart sound anomaly detection and acoustic scene classification. The proposed method shows a significant 1.02\% improvement in MACC for the heart sound classification task and 2.11\% improvement in accuracy for the acoustic scene classification task compared to the classical spectrogram image features. The source code of our experiments can be found at \url{https://github.com/mHealthBuet/SpectNet}
翻译:从音频信号中识别模式是一个积极的研究课题,包括声标记、声学场景分类、音乐分类和其他领域。 Spectrogrogram 和 mel-频 Cepstral 系数(MFCC) 是用于音频信号分析和分类的最常用特征之一。 最近, 深相导神经网络(CNN) 成功地用光谱2D 功能处理音频分类问题。 在本文中, 我们介绍SpectNet, 是一个综合的前端层, 提取CNN结构中可用于音频模式识别任务的光谱学特征。 前端层使用可学习的伽马酮过滤器(MFCC) 。 拟议的层输出输出为 2D 光谱图像, 可以输入到 2D CNN 分类中。 整个网络的参数, 包括前端过滤库, 可以通过后方分析来更新。 这个培训计划允许根据目标音频数据集对光谱/图像特性进行微调。 拟议的方法在两种不同的音频信号分类任务中使用两种不同的音频信号分类任务: 心脏异常/直径扫描过滤过滤过滤过滤器过滤器检测和声序图像分类。 建议的方法在Syal- creal- clologalcalcalcalcalcolalalisalisalislalalal=1. rocal drocalisalisalisalisal salisalisalation 。