We present CoGS, a novel method for the style-conditioned, sketch-driven synthesis of images. CoGS enables exploration of diverse appearance possibilities for a given sketched object, enabling decoupled control over the structure and the appearance of the output. Coarse-grained control over object structure and appearance are enabled via an input sketch and an exemplar "style" conditioning image to a transformer-based sketch and style encoder to generate a discrete codebook representation. We map the codebook representation into a metric space, enabling fine-grained control over selection and interpolation between multiple synthesis options before generating the image via a vector quantized GAN (VQGAN) decoder. Our framework thereby unifies search and synthesis tasks, in that a sketch and style pair may be used to run an initial synthesis which may be refined via combination with similar results in a search corpus to produce an image more closely matching the user's intent. We show that our model, trained on the 125 object classes of our newly created Pseudosketches dataset, is capable of producing a diverse gamut of semantic content and appearance styles.
翻译:我们向一个基于变压器的素描和风格编码器演示了CoGS, 这是一种以素描为动力的图像合成的新方法。 CoGS 能够探索给定的素描对象的各种外观可能性,从而能够对结构和输出的外观进行分解控制。 粗糙的天体结构和外观控制通过输入素描和一个“ 风格” 缩略式调整图像到一个基于变压器的素描和风格的编码器, 生成一个独立的编码器代表器。 我们将代码簿代表器映射成一个测量空间, 使得在通过矢量定量的 GAN (VQGAN) 解码器生成图像之前, 能够对选择和多个合成选项进行精细的切控制。 我们的框架由此可以统一搜索和合成任务, 即一个素描和风格配对可以用来运行初步合成, 通过与搜索器中的类似结果结合来改进, 以产生更接近用户意图的图像。 我们展示了我们的模型, 是在我们新创建的 Psedodocheches 数据集的125个对象类别上受过训练的模型, 能够生成一个多样化的精制内容和外观风格。