We empirically study the effect of noise scheduling strategies for denoising diffusion generative models. There are three findings: (1) the noise scheduling is crucial for the performance, and the optimal one depends on the task (e.g., image sizes), (2) when increasing the image size, the optimal noise scheduling shifts towards a noisier one (due to increased redundancy in pixels), and (3) simply scaling the input data by a factor of $b$ while keeping the noise schedule function fixed (equivalent to shifting the logSNR by $\log b$) is a good strategy across image sizes. This simple recipe, when combined with recently proposed Recurrent Interface Network (RIN), yields state-of-the-art pixel-based diffusion models for high-resolution images on ImageNet, enabling single-stage, end-to-end generation of diverse and high-fidelity images at 1024$\times$1024 resolution for the first time (without upsampling/cascades).
翻译:我们从经验上研究噪音排期战略对消除传播基因化模型的影响。有三项结论:(1) 噪音排期对于性能至关重要,最佳的取决于任务(例如图像大小),(2) 当增加图像大小时,最佳噪音排期将转向噪音排期(由于像素中的冗余增加),(3) 简单地将输入数据以美元乘以一个系数乘以1美元,同时将噪音排期功能固定不变(相当于将日志系统信息系统调换1美元 b美元),是跨越图像大小的好战略。这种简单的方法,与最近提议的经常界面网络(RIN)相结合,产生图像网高分辨率图像的状态、以像素为基础的高分辨率图像的像素传播模型,使不同和高密度图像的单级、端至终端生成首次能达到1024美元,1024美元分辨率(不进行抽样/盒式取样)。