We present VoloGAN, an adversarial domain adaptation network that translates synthetic RGB-D images of a high-quality 3D model of a person, into RGB-D images that could be generated with a consumer depth sensor. This system is especially useful to generate high amount training data for single-view 3D reconstruction algorithms replicating the real-world capture conditions, being able to imitate the style of different sensor types, for the same high-end 3D model database. The network uses a CycleGAN framework with a U-Net architecture for the generator and a discriminator inspired by SIV-GAN. We use different optimizers and learning rate schedules to train the generator and the discriminator. We further construct a loss function that considers image channels individually and, among other metrics, evaluates the structural similarity. We demonstrate that CycleGANs can be used to apply adversarial domain adaptation of synthetic 3D data to train a volumetric video generator model having only few training samples.
翻译:我们提出VoloGAN,这是一个对抗性域域适应网络,将高质量的3D型人模型的合成RGB-D图像转换成可使用消费者深度传感器生成的RGB-D图像,这个系统特别有助于产生大量培训数据,用于复制真实世界捕获条件的单视3D重建算法,能够模仿不同传感器类型的风格,用于同一高端3D模型数据库。网络使用一个循环GAN框架,为生成者提供U-Net结构,并受SIV-GAN的启发为歧视者提供培训。我们使用不同的优化和学习进度表来培训生成者和制导者。我们进一步构建了一个损失函数,单独考虑图像渠道,并除其他指标外,评估结构相似性。我们证明,CyopleGANs可以使用合成3D数据进行对抗性域调整,用于培训数量式视频生成器模型,培训样本不多。