Diffusion models are rising as a powerful solution for high-fidelity image generation, which exceeds GANs in quality in many circumstances. However, their slow training and inference speed is a huge bottleneck, blocking them from being used in real-time applications. A recent DiffusionGAN method significantly decreases the models' running time by reducing the number of sampling steps from thousands to several, but their speeds still largely lag behind the GAN counterparts. This paper aims to reduce the speed gap by proposing a novel wavelet-based diffusion structure. We extract low-and-high frequency components from both image and feature levels via wavelet decomposition and adaptively handle these components for faster processing while maintaining good generation quality. Furthermore, we propose to use a reconstruction term, which effectively boosts the model training convergence. Experimental results on CelebA-HQ, CIFAR-10, LSUN-Church, and STL-10 datasets prove our solution is a stepping-stone to offering real-time and high-fidelity diffusion models. Our code and pre-trained checkpoints will be available at \url{https://github.com/VinAIResearch/WaveDiff.git}.
翻译:传播模型正在上升,成为高信仰图像生成的强大解决方案,在许多情况下,这种模型的质量超过了GAN质量,但是,它们的缓慢培训和推断速度是一个巨大的瓶颈,阻碍了它们用于实时应用。最近的DifulfulGAN方法通过将取样步骤的数目从数千个减少到数个,大大缩短了模型运行时间,但其速度仍然大大落后于GAN。本文件的目的是通过提出一个新的波盘基传播结构来缩小速度差距。我们通过波盘分解和适应性处理这些组件以更快的处理,同时保持良好的发电质量,从图像和特征级别上提取低和高频率组件。此外,我们提议使用一个重建术语,有效地推动培训模式的趋同。CelibA-Q、CIFAR-10、LSUN-Church和STL-10数据集的实验结果证明我们的解决方案是提供实时和高纤维化传播模型的垫脚石。我们的编码和预先训练前检查站将在\urgif/Righ/WAISim_Ribus.