Audio pattern recognition (APR) is an important research topic and can be applied to several fields related to our lives. Therefore, accurate and efficient APR systems need to be developed as they are useful in real applications. In this paper, we propose a new convolutional neural network (CNN) architecture and a method for improving the inference speed of CNN-based systems for APR tasks. Moreover, using the proposed method, we can improve the performance of our systems, as confirmed in experiments conducted on four audio datasets. In addition, we investigate the impact of data augmentation techniques and transfer learning on the performance of our systems. Our best system achieves a mean average precision (mAP) of 0.450 on the AudioSet dataset. Although this value is less than that of the state-of-the-art system, the proposed system is 7.1x faster and 9.7x smaller. On the ESC-50, UrbanSound8K, and RAVDESS datasets, we obtain state-of-the-art results with accuracies of 0.961, 0.908, and 0.748, respectively. Our system for the ESC-50 dataset is 1.7x faster and 2.3x smaller than the previous best system. For the RAVDESS dataset, our system is 3.3x smaller than the previous best system. We name our systems "Efficient Residual Audio Neural Networks".
翻译:音频模式识别(APR)是一个重要的研究课题,可以适用于与我们生活有关的几个领域。因此,需要开发准确有效的RAPR系统,因为它们在实际应用中有用。在本文件中,我们提议一个新的神经神经网络(CNN)结构,以及改进CNN的系统对PRA任务的推断速度的方法。此外,如在四个音频数据集上进行的实验所证实的,我们还可以改进我们系统的性能。此外,我们调查数据增强技术和传输学习对我们系统绩效的影响。我们的最佳系统在音频Set数据集上实现了0.450的平均平均精确度(MAP),尽管这一价值低于最新系统,但拟议的系统速度为7.1x更快,9.7x较小。在ESC-50、UrbanSound8K和RAVDESS数据集方面,我们获得了最新的结果,我们获得的音频系统为0.961、0.908和0.748i。我们的音频系统比我们以前的RAS-50S-3.3x最佳数据系统要快。