While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio augmentations, we were able to present an efficient end-to-end network with strong generalization ability. Experiments on a variety of sound classification sets demonstrate the effectiveness and robustness of our approach, by achieving state-of-the-art results in various settings. Public code will be available.
翻译:虽然提出了高效的架构和大量用于端到端图像分类任务的扩增结构,并对此进行了大量调查,但最先进的音频分类技术仍然依靠大量音频信号和大型结构的表述,并参照大型数据集进行微调。通过利用音频和新音频扩增所遗留的轻量级性质,我们得以展示一个高效的端到端网络,具有很强的简单化能力。关于各种健全的分类组的实验表明我们的方法的有效性和稳健性,在各种环境中取得了最新的结果。公共代码将存在。