MagicFusion: 通过融合扩散模型提升文本到图像生成性能 (MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models)

The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. Specifically, we experimentally find that the responses of classifier-free guidance are highly related to the saliency of generated images. Thus we propose to trust different models in their areas of expertise by blending the predicted noises of two diffusion models in a saliency-aware manner. SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks. Extensive experiments show the impressive effectiveness of SNB in various applications. Project page is available at https://magicfusion.github.io/.

翻译：开源AI社区的出现产生了大量在各种数据集上训练的强大的文本引导扩散模型。虽然人们已经尝试通过组合这些模型来减少它们的缺点，但还没有进行深入的研究。在本文中，我们提出了一种简单而有效的方法，称为感知显著性噪声融合（SNB），可以使融合的文本引导扩散模型实现更可控的生成。具体来说，我们通过实验发现，由分类器无关引导所产生的响应与生成的图像的显著性高度相关。因此，我们提出通过感知显著性的方式将两种扩散模型预测的噪声混合在一起，以信任各自在自己应用领域上的专长。SNB无需其他注释（如遮罩）即可完成DDIM采样过程，自动对齐两个噪声空间的语义。大量实验结果显示，SNB在各种应用中的有效性令人印象深刻。项目主页可在https://magicfusion.github.io/上查看。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

CVPR2022 | 多模态Transformer用于视频分割效果惊艳

专知会员服务

42+阅读 · 2022年3月12日

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日