In this work, we propose and validate a framework to leverage language-image pretraining representations for training-free zero-shot sketch-to-image synthesis. We show that disentangled content and style representations can be utilized to guide image generators to employ them as sketch-to-image generators without (re-)training any parameters. Our approach for disentangling style and content entails a simple method consisting of elementary arithmetic assuming compositionality of information in representations of input sketches. Our results demonstrate that this approach is competitive with state-of-the-art instance-level open-domain sketch-to-image models, while only depending on pretrained off-the-shelf models and a fraction of the data.
翻译:在这项工作中,我们提出并验证一个框架,以利用语言图像培训前演示,实现无培训零光草图到图像合成。我们表明,可以使用不相干的内容和风格演示,指导图像生成者将其用作草图到图像生成器,而无需(再)培训任何参数。我们的脱钩风格和内容方法包含一个简单的方法,即假设输入草图中的信息的构成性,基本算术。我们的结果表明,这一方法与最先进的实证级开放式草图到图像模型具有竞争力,而仅取决于预先培训的现成模型和数据的一部分。