作曲家:创意和可控图像合成与可合成条件 (Composer: Creative and Controllable Image Synthesis with Composable Conditions)

Recent large-scale generative models learned on big data are capable of synthesizing incredible images yet suffer from limited controllability. This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity. With compositionality as the core idea, we first decompose an image into representative factors, and then train a diffusion model with all these factors as the conditions to recompose the input. At the inference stage, the rich intermediate representations work as composable elements, leading to a huge design space (i.e., exponentially proportional to the number of decomposed factors) for customizable content creation. It is noteworthy that our approach, which we call Composer, supports various levels of conditions, such as text description as the global information, depth map and sketch as the local guidance, color histogram for low-level details, etc. Besides improving controllability, we confirm that Composer serves as a general framework and facilitates a wide range of classical generative tasks without retraining. Code and models will be made available.

翻译：在海量数据上所学的近期大规模基因模型能够合成令人难以置信的图像,但控制力有限。这项工作提供了新一代模式,允许灵活控制输出图像,如空间布局和调色板,同时保持合成质量和模型创造力。以构成性为核心理念,我们首先将图像分解成具有代表性的因素,然后用所有这些因素来培训一个传播模型,作为重新配置输入的条件。在推理阶段,丰富的中间代表作为可兼容元素开展工作,导致一个庞大的设计空间(即与可定制内容生成的成份数成指数成比例的指数),用于定制内容的创建。值得注意的是,我们称之为合成者的方法支持各种水平的条件,如作为当地指导的文字描述、深度地图和草图等,低层细节的颜色直方图等。除了改进控制性外,我们还确认作曲家作为一般框架发挥作用,促进一系列广泛的典型基因化任务,而无需再培训。代码和模型将被提供。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日