The research in Environmental Sound Classification (ESC) has been progressively growing with the emergence of deep learning algorithms. However, data scarcity poses a major hurdle for any huge advance in this domain. Data augmentation offers an excellent solution to this problem. While Generative Adversarial Networks (GANs) have been successful in generating synthetic speech and sounds of musical instruments, they have hardly been applied to the generation of environmental sounds. This paper presents EnvGAN, the first ever application of GANs for the adversarial generation of environmental sounds. Our experiments on three standard ESC datasets illustrate that the EnvGAN can synthesize audio similar to the ones in the datasets. The suggested method of augmentation outshines most of the futuristic techniques for audio augmentation.
翻译:随着深层学习算法的出现,关于无害环境分类(ESC)的研究一直在逐步增加,然而,数据稀缺是这一领域任何重大进步的主要障碍。数据增强是解决这一问题的极好办法。虽然创用反versarial 网络(GANs)成功地生成了合成语言和音乐乐器的声音,但很少应用于环境声音的生成。本文介绍了EnvGAN,这是首次将GAN应用于对抗性环境声音的生成。我们在三个标准的 ESC 数据集上的实验表明,EnvGAN可以合成与数据集中的音频相似的音频。建议的扩增方法是扩音的多数未来技术。