Generative Adversarial Networks (GANs) can produce images of remarkable complexity and realism but are generally structured to sample from a single latent source ignoring the explicit spatial interaction between multiple entities that could be present in a scene. Capturing such complex interactions between different objects in the world, including their relative scaling, spatial layout, occlusion, or viewpoint transformation is a challenging problem. In this work, we propose a novel self-consistent Composition-by-Decomposition (CoDe) network to compose a pair of objects. Given object images from two distinct distributions, our model can generate a realistic composite image from their joint distribution following the texture and shape of the input objects. We evaluate our approach through qualitative experiments and user evaluations. Our results indicate that the learned model captures potential interactions between the two object domains, and generates realistic composed scenes at test time.
翻译:生成自变网络(GANs) 能够产生出非常复杂和现实的图像,但一般结构上却从单一的潜在来源进行抽样,忽略了多个实体之间可能存在于场景中的明显的空间互动。捕捉到世界上不同物体之间的这种复杂互动,包括其相对规模、空间布局、封闭性或观点转变是一个具有挑战性的问题。在这项工作中,我们提议建立一个新型的自相矛盾的逐项组合(Code)网络,以组成一对对象。鉴于两个不同分布的物体图像,我们的模型可以在输入对象的纹理和形状之后,通过它们的联合分布产生现实的复合图像。我们通过定性实验和用户评估来评估我们的方法。我们的结果显示,所学的模型捕捉了两个对象领域之间的潜在互动,并在试验时产生现实的组合场景。