We empirically study the effect of noise scheduling strategies for denoising diffusion generative models. There are three findings: (1) the noise scheduling is crucial for the performance, and the optimal one depends on the task (e.g., image sizes), (2) when increasing the image size, the optimal noise scheduling shifts towards a noisier one (due to increased redundancy in pixels), and (3) simply scaling the input data by a factor of $b$ while keeping the noise schedule function fixed (equivalent to shifting the logSNR by $\log b$) is a good strategy across image sizes. This simple recipe, when combined with recently proposed Recurrent Interface Network (RIN), yields state-of-the-art pixel-based diffusion models for high-resolution images on ImageNet, enabling single-stage, end-to-end generation of diverse and high-fidelity images at 1024$\times$1024 resolution (without upsampling/cascades).
翻译:我们从经验上研究噪音排期战略对消除传播基因化模型的影响。有三项结论:(1) 噪音排期对于性能至关重要,最佳的取决于任务(例如图像大小),(2) 当增加图像大小时,最佳噪音排期将转向噪音排期(由于像素中的冗余增加),(3) 简单地将输入数据以美元乘以1倍,同时保持噪音排期功能固定不变(相当于将日志SNR调换1美元 b美元),是跨越图像大小的好战略。这一简单方法,与最近提议的经常界面网络(RIN)相结合,产生图像网络高分辨率图像的最新艺术像素基传播模型,使不同和高密度图像的单级、端至端生成能够达到1024美元,1024美元分辨率(不标注/盒状)。