Generating images from hand-drawings is a crucial and fundamental task in content creation. The translation is difficult as there exist infinite possibilities and the different users usually expect different outcomes. Therefore, we propose a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models. Users can not only decide the level of faithfulness to the input strokes and sketches, but also the degree of realism, as the user inputs are usually not consistent with the real images. Qualitative and quantitative experiments demonstrate that our framework achieves state-of-the-art performance while providing flexibility in generating customized images with control over shape, color, and realism. Moreover, our method unleashes applications such as editing on real images, generation with partial sketches and strokes, and multi-domain multi-modal synthesis.
翻译:从手绘生成图像是内容创建的关键和根本任务。 翻译是困难的, 因为存在无限的可能性, 而不同的用户通常期望不同的结果。 因此, 我们提议一个统一的框架, 支持对基于传播模型的素描和划线的图像合成进行三维控制。 用户不仅可以决定对输入中划线和草图的忠诚程度, 还可以决定现实主义的程度, 因为用户投入通常与真实图像不相符。 定性和定量实验表明, 我们的框架实现了最新水平的性能, 同时也提供了生成控制形状、 颜色 和 现实主义 的定制图像的灵活性 。 此外, 我们的方法还释放了各种应用, 比如对真实图像的编辑, 生成部分草图和划线, 以及多维谱的多模式合成 。