增强扩散模型的可控制性</s> (Towards Enhanced Controllability of Diffusion Models)

Denoising Diffusion models have shown remarkable capabilities in generating realistic, high-quality and diverse images. However, the extent of controllability during generation is underexplored. Inspired by techniques based on GAN latent space for image manipulation, we train a diffusion model conditioned on two latent codes, a spatial content mask and a flattened style embedding. We rely on the inductive bias of the progressive denoising process of diffusion models to encode pose/layout information in the spatial structure mask and semantic/style information in the style code. We propose two generic sampling techniques for improving controllability. We extend composable diffusion models to allow for some dependence between conditional inputs, to improve the quality of generations while also providing control over the amount of guidance from each latent code and their joint distribution. We also propose timestep dependent weight scheduling for content and style latents to further improve the translations. We observe better controllability compared to existing methods and show that without explicit training objectives, diffusion models can be used for effective image manipulation and image translation.

翻译：disoising Difmission 模型在生成现实的、高质量的和多样化的图像方面表现出非凡的能力,然而,在生成过程中的可控性程度没有得到充分探讨。受基于GAN潜在图像操纵空间的技术的启发,我们训练了一种以两种潜在代码为条件的传播模型,即空间内容遮罩和平板化的嵌入式;我们依靠逐步去除扩散模型过程的诱导偏向,将空间结构遮罩和样式代码中的语义/风格信息中的容积信息编码起来。我们提出了两种通用抽样技术,以改进可控性。我们推广了可兼容的传播模型,以便允许有条件投入之间的某些依赖性,提高各代人的质量,同时对每个潜在代码的指导量及其共同分布进行控制。我们还建议对内容和风格潜伏进行时间上的权重时间安排,以进一步改进翻译。我们观察到与现有方法相比,可控性要好,我们发现与现有方法相比,可控性要好得多,我们提出两种通用的抽样技术,以便改进可控性。我们推广模型可以用于有效的图像操纵和图像翻译。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日