Recent advances in generative adversarial networks (GANs) have achieved great success in automated image composition that generates new images by embedding interested foreground objects into background images automatically. On the other hand, most existing works deal with foreground objects in two-dimensional (2D) images though foreground objects in three-dimensional (3D) models are more flexible with 360-degree view freedom. This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically. VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable. The differential discriminator guides to learn geometric transformation from background images so that the composed 3D models can be aligned with the background images with realistic poses and views. The texture generator adopts a novel view encoding mechanism for generating accurate object textures for the 3D models under the estimated views. Extensive experiments over two synthesis tasks (car synthesis with KITTI and pedestrian synthesis with Cityscapes) show that VA-GAN achieves high-fidelity composition qualitatively and quantitatively as compared with state-of-the-art generation methods.
翻译:基因对抗网络( GANs) 最新进步在自动图像构成方面取得了巨大成功,通过将感兴趣的前景对象自动嵌入背景图像中,生成了新的图像。另一方面,大多数现有工作在二维(2D)图像中处理前景对象,尽管三维(3D)模型中的前景对象更灵活,具有360度视图自由度。本文展示了一个创新的视图对齐 GAN(VA-GAN),它通过将3D模型实际和自动嵌入2D背景图像中而形成新的图像。 VA-GAN由一个纹理生成器和区别区分分析器组成,它们相互联系,端到端到端到端可以训练。从背景图像中学习几何度转换的差别分析器指南,使构成的3D模型能够与背景图像相匹配,并具有现实的外观和视图。 文本生成器采用了一种新型的视图编码机制,为估计视图下的3D模型生成准确的物体纹理。 在两个合成任务( 与 KITTI 和行人合成器合成) 上进行了广泛的实验( 与 KITTI 和城市景观合成) 显示, VA- GAN 以高度和定量生成方式实现高度生成。