利用扩散Transformer自身内部动力学进行引导 (Guiding a Diffusion Transformer with the Internal Dynamics of Itself)

The diffusion model presents a powerful ability to capture the entire (conditional) data distribution. However, due to the lack of sufficient training and data to learn to cover low-probability areas, the model will be penalized for failing to generate high-quality images corresponding to these areas. To achieve better generation quality, guidance strategies such as classifier free guidance (CFG) can guide the samples to the high-probability areas during the sampling stage. However, the standard CFG often leads to over-simplified or distorted samples. On the other hand, the alternative line of guiding diffusion model with its bad version is limited by carefully designed degradation strategies, extra training and additional sampling steps. In this paper, we proposed a simple yet effective strategy Internal Guidance (IG), which introduces an auxiliary supervision on the intermediate layer during training process and extrapolates the intermediate and deep layer's outputs to obtain generative results during sampling process. This simple strategy yields significant improvements in both training efficiency and generation quality on various baselines. On ImageNet 256x256, SiT-XL/2+IG achieves FID=5.31 and FID=1.75 at 80 and 800 epochs. More impressively, LightningDiT-XL/1+IG achieves FID=1.34 which achieves a large margin between all of these methods. Combined with CFG, LightningDiT-XL/1+IG achieves the current state-of-the-art FID of 1.19.

翻译：扩散模型展现出捕获完整（条件）数据分布的强大能力。然而，由于缺乏足够的训练和数据来学习覆盖低概率区域，模型将因无法生成对应这些区域的高质量图像而受到惩罚。为获得更好的生成质量，在采样阶段可采用无分类器引导等策略将样本导向高概率区域。然而，标准无分类器引导常导致样本过度简化或失真。另一方面，利用劣化版本引导扩散模型的替代方案受限于精心设计的退化策略、额外训练和附加采样步骤。本文提出一种简单而有效的策略——内部引导，该策略在训练过程中对中间层引入辅助监督，并在采样过程中通过外推中间层与深层输出来获得生成结果。这一简单策略在多种基线模型上实现了训练效率和生成质量的显著提升。在ImageNet 256x256数据集上，SiT-XL/2+IG在80和800轮训练时分别达到FID=5.31和FID=1.75。更令人印象深刻的是，LightningDiT-XL/1+IG达到FID=1.34，在所有方法中取得显著优势。结合无分类器引导后，LightningDiT-XL/1+IG实现了当前最优的FID值1.19。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【ICCV2023】保留模态结构改进多模态学习

专知会员服务

31+阅读 · 2023年8月28日

【NeurIPS 2022】扩散模型的深度平衡方法

专知会员服务

40+阅读 · 2022年11月5日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日