Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification augmentation technique based on the diffusion probabilistic model with DPM-Solver$++$ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we train a top-k selection discriminator on the dataset. According to the experiment results, the synthesized spectrograms have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on \url{https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation}.
翻译:尽管近年来强大的深度学习技术不断发展,但对于模型而言,仍然需要大量的训练数据来避免过拟合。近期,利用生成对抗网络(GAN)生成合成数据集的方法已经被提出来解决数据不足的问题。然而,尽管GAN方法取得了一些进展,但它们通常很难训练或者难以生成高质量的数据样本。在本文中,我们提出了一种基于扩散概率模型(Diffusion Probabilistic Model)的环境音频分类增强技术,采用DPM-Solver++进行快速采样。此外,我们还训练了一个top-k选择判别器来确保生成的频谱图的质量。根据实验结果,合成的频谱图与原始数据集具有相似的特征,并且相较于传统的数据增强技术,能够显著提高不同领先模型的分类准确性。公共代码可在 https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation 上获取。