While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio augmentations, we were able to present an efficient end-to-end network with strong generalization ability. Experiments on a variety of sound classification sets demonstrate the effectiveness and robustness of our approach, by achieving state-of-the-art results in various settings. Public code is available at: \href{https://github.com/Alibaba-MIIL/AudioClassfication}{this http url}
翻译:虽然建议并认真调查了高效的架构和大量用于端到端图像分类任务的扩增结构,但最新的音频分类技术仍然依赖从大型数据集中微调的多个音频信号和大型结构的众多表述,从大型数据集中加以微调。通过利用音频和新音频扩增的遗留轻量级性质,我们得以展示出一个高效的端到端网络,并具有很强的概括能力。关于各种健全的分类组的实验显示了我们方法的有效性和稳健性,在各种环境中取得了最新成果。公共代码见:\href{https://github.com/Alibaba-MIL/AudioClassfication_http url}