Deep neural networks have recently achieved breakthroughs in sound generation. Despite the outstanding sample quality, current sound generation models face issues on small-scale datasets (e.g., overfitting and low coverage of sound classes), significantly limiting performance. In this paper, we make the first attempt to investigate the benefits of pre-training on sound generation with AudioLDM, the cutting-edge model for audio generation, as the backbone. Our study demonstrates the advantages of the pre-trained AudioLDM, especially in data-scarcity scenarios. In addition, the baselines and evaluation protocol for sound generation systems are not consistent enough to compare different studies directly. Aiming to facilitate further study on sound generation tasks, we benchmark the sound generation task on various frequently-used datasets. We hope our results on transfer learning and benchmarks can provide references for further research on conditional sound generation.
翻译:最近,深神经网络在健康生成方面取得了突破。尽管样本质量出色,但当前健康生成模型在小规模数据集(例如,超装和低保音频舱)方面面临着问题,这极大地限制了绩效。在本文件中,我们首次尝试调查与音频生成前沿模型“音频生成前沿模型”(音频生成前沿模型,作为主干线)一起对声音生成进行预培训的好处。我们的研究显示了预先培训的音频生成模型的优势,特别是在数据采集情景方面。此外,音频生成系统的基线和评估协议不够一致,无法直接比较不同的研究。为了便利对健康生成任务的进一步研究,我们把音频生成任务以各种经常使用的数据集作为基准。我们希望我们在传输学习和基准方面的成果能够为有条件的音频生成的进一步研究提供参考。</s>