Large, text-conditioned generative diffusion models have recently gained a lot of attention for their impressive performance in generating high-fidelity images from text alone. However, achieving high-quality results is almost unfeasible in a one-shot fashion. On the contrary, text-guided image generation involves the user making many slight changes to inputs in order to iteratively carve out the envisioned image. However, slight changes to the input prompt often lead to entirely different images being generated, and thus the control of the artist is limited in its granularity. To provide flexibility, we present the Stable Artist, an image editing approach enabling fine-grained control of the image generation process. The main component is semantic guidance (SEGA) which steers the diffusion process along variable numbers of semantic directions. This allows for subtle edits to images, changes in composition and style, as well as optimization of the overall artistic conception. Furthermore, SEGA enables probing of latent spaces to gain insights into the representation of concepts learned by the model, even complex ones such as 'carbon emission'. We demonstrate the Stable Artist on several tasks, showcasing high-quality image editing and composition.
翻译:巨大的, 以文字为条件的基因化传播模型最近因其光靠文字制作高贞洁图像的惊人性能而引起很多关注。 然而, 实现高质量结果几乎不可能一拍就绪。 相反, 文本导成图像的生成需要用户对输入进行许多微小的修改, 以便迭接地刻出设想的图像。 然而, 输入提示的微小变化往往导致产生完全不同的图像, 因此艺术家的控制在颗粒上受到限制 。 为了提供灵活性, 我们展示了“ 稳定艺术家 ”, 一个图像编辑方法, 使图像生成过程的精细控制成为可能。 主要组成部分是语义性指导( SEGA ), 引导传播过程沿着不同语义性方向的数字。 这允许对图像进行微妙的修改, 结构和风格的改变, 以及整个艺术概念的优化 。 此外, SEGA 能够探索潜在空间, 以深入了解模型所学概念的表述方式, 包括“ 碳排放 ” 。 我们展示了“ ” 。