Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification augmentation technique based on the diffusion probabilistic model with DPM-Solver$++$ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we train a top-k selection discriminator on the dataset. According to the experiment results, the synthesized spectrograms have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.
翻译:尽管最近几年出现了强大的深度学习技术, 但仍需大量训练数据,以免过拟合。利用生成对抗网络 (GAN) 生成合成数据集是克服这一问题的最新方法,但尽管GAN技术有所发展,但常常很难训练或无法生成高质量的数据样本。为了解决这些问题,本文提出了一种基于扩散概率模型和 DPM-Solver++ 的环境声音分类增强技术,以实现快速采样。此外,为了确保生成的频谱图的质量,我们在数据集上训练了一个 Top-k 选择鉴别器。根据实验结果,我们得到的合成频谱图具有类似于原始数据集的特征,并且与其他传统数据增强技术相比,可以显著提高不同的最先进模型的分类精度。我们在 https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation 上公开了代码。