We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a joint learning of multiple unpaired image-to-image translation models, one per camera viewpoint. The joint learning is imposed by constraints on the shared 3D human pose in order to encourage the 2D pose projections in all views to be consistent. Experimental results on the CMU-Panoptic dataset demonstrate the effectiveness of the suggested framework in generating photo-realistic images of persons with new poses that are more consistent across all views in comparison to a standard Image-to-Image baseline. The code is available at: https://github.com/sony-si/MultiView-Img2Img
翻译:我们的任务是为个人图像生成提供多视图图像到图像的翻译。目标是综合摄影现实多视图图像,使所有观点都具有相容性。我们提议的端到端框架的基础是共同学习多种未受保护图像到图像的翻译模型,每个摄像头一个视角。联合学习是受3D人共同外表制约而强加的,目的是鼓励2D人在所有观点中作出一致的预测。CMU-Panopy数据集的实验结果显示了所建议框架在生成与标准图像到图像基线相比在所有观点中更加一致的具有新面貌的人的摄影现实图像方面的有效性。该代码可在以下网址查阅:https://github.comsony-si/MultiView-Img2IMg。