We introduce an Extended Textual Conditioning space in text-to-image models, referred to as $P+$. This space consists of multiple textual conditions, derived from per-layer prompts, each corresponding to a layer of the denoising U-net of the diffusion model. We show that the extended space provides greater disentangling and control over image synthesis. We further introduce Extended Textual Inversion (XTI), where the images are inverted into $P+$, and represented by per-layer tokens. We show that XTI is more expressive and precise, and converges faster than the original Textual Inversion (TI) space. The extended inversion method does not involve any noticeable trade-off between reconstruction and editability and induces more regular inversions. We conduct a series of extensive experiments to analyze and understand the properties of the new space, and to showcase the effectiveness of our method for personalizing text-to-image models. Furthermore, we utilize the unique properties of this space to achieve previously unattainable results in object-style mixing using text-to-image models. Project page: https://prompt-plus.github.io
翻译:在文本到图像模型中,我们引入了一个扩展的文本附加空间,称为$P+$。这个空间由多个文本条件组成,这些条件来自每个层的提示,每个与扩散模型的分解 U-net 的一层相对应。我们显示,扩展的空间对图像合成的分解和控制作用更大。我们进一步引入扩展的文本转换(XTI),图像被倒转为$P+$,由每个部分的符号代表。我们显示, XTI比原始文本转换(TI)空间更能表达和精确,并会合得更快。扩展的反转方法不涉及重建和编辑性之间的任何明显交易,并导致更经常的反转。我们进行了一系列广泛的实验,以分析和了解新空间的特性,并展示我们个人化文本到图像模型的方法的有效性。此外,我们利用这一空间的独特性来利用文本到image模型在对象式混合中取得以前无法实现的结果。项目页: https://prompresgiogio。</s>