Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need to train a time-dependent classifier or a condition-dependent score estimator, which increases the cost of constructing conditional diffusion models and is inconvenient to transfer across different conditions. Some current works aim to overcome this limitation by proposing training-free solutions, but most can only be applied to a specific category of tasks and not to more general conditions. In this work, we propose a training-Free conditional Diffusion Model (FreeDoM) used for various conditions. Specifically, we leverage off-the-shelf pre-trained networks, such as a face detection model, to construct time-independent energy functions, which guide the generation process without requiring training. Furthermore, because the construction of the energy function is very flexible and adaptable to various conditions, our proposed FreeDoM has a broader range of applications than existing training-free methods. FreeDoM is advantageous in its simplicity, effectiveness, and low cost. Experiments demonstrate that FreeDoM is effective for various conditions and suitable for diffusion models of diverse data domains, including image and latent code domains.
翻译:近年来,由于其出色的生成能力,条件性扩散模型在多种应用中日益受到关注。然而,很多现有方法都需要进行训练,需要训练一个时间依赖分类器或条件依赖的得分估计器,这增加了构建条件扩散模型的成本,不便于在不同条件下转移。一些现有的工作旨在通过提出无需训练的解决方案来克服这种限制,但大多数只能应用于特定类别的任务,而不能适用于更普遍的条件。在本工作中,我们提出了一种用于各种条件的无需训练的条件漫步模型(FreeDoM)。具体来说,我们利用现成的预训练网络,例如人脸检测模型,构建时间独立的能源函数,引导生成过程而无需进行训练。此外,由于能量函数的构建非常灵活和可适应各种条件,我们提出的FreeDoM比现有的无需训练方法具有更广泛的应用范围。FreeDoM 在简洁性、有效性和低成本方面具有优势。实验证明,FreeDoM 适用于各种条件,适合多样化数据域的扩散模型,包括图像和潜在代码域。