Photorealistic image generation from simulated label maps are necessitated in several contexts, such as for medical training in virtual reality. With conventional deep learning methods, this task requires images that are paired with semantic annotations, which typically are unavailable. We introduce a contrastive learning framework for generating photorealistic images from simulated label maps, by learning from unpaired sets of both. Due to potentially large scene differences between real images and label maps, existing unpaired image translation methods lead to artifacts of scene modification in synthesized images. We utilize simulated images as surrogate targets for a contrastive loss, while ensuring consistency by utilizing features from a reverse translation network. Our method enables bidirectional label-image translations, which is demonstrated in a variety of scenarios and datasets, including laparoscopy, ultrasound, and driving scenes. By comparing with state-of-the-art unpaired translation methods, our proposed method is shown to generate realistic and scene-accurate translations.
翻译:从模拟标签图中产生摄影现实图像在多种情况下是必要的,例如用于虚拟现实的医疗培训。根据传统的深层学习方法,这项任务需要配有通常没有的语义说明的图像。我们从模拟标签图中学习,引入一个对比式学习框架,从模拟标签图中生成摄影现实图像。由于真实图像和标签图之间可能存在巨大的场景差异,现有的未受保护图像翻译方法导致合成图像中场景修改的文物。我们使用模拟图像作为替代目标,进行对比性损失,同时利用反向翻译网络的特征确保一致性。我们的方法可以使双向标签图像翻译,这种翻译在各种情景和数据集中展示,包括腹腔镜、超声波和驱动场。通过与最新技术的未受保护的翻译方法进行比较,我们提出的方法可以产生现实的和场景准确的翻译。