Research in unpaired video translation has mainly focused on short-term temporal consistency by conditioning on neighboring frames. However for transfer from simulated to photorealistic sequences, available information on the underlying geometry offers potential for achieving global consistency across views. We propose a novel approach which combines unpaired image translation with neural rendering to transfer simulated to photorealistic surgical abdominal scenes. By introducing global learnable textures and a lighting-invariant view-consistency loss, our method produces consistent translations of arbitrary views and thus enables long-term consistent video synthesis. We design and test our model to generate video sequences from minimally-invasive surgical abdominal scenes. Because labeled data is often limited in this domain, photorealistic data where ground truth information from the simulated domain is preserved is especially relevant. By extending existing image-based methods to view-consistent videos, we aim to impact the applicability of simulated training and evaluation environments for surgical applications. Code and data: http://opencas.dkfz.de/video-sim2real.
翻译:在未受重视的视频翻译研究中,主要侧重于短期时间一致性,以近身框架为条件。然而,在模拟到摄影现实序列的转移方面,关于基本几何学的现有信息有可能实现各种观点的全球一致性。我们提出一种新颖的方法,将未受重视的图像翻译与神经转化相结合,将模拟到光真化外科腹部的外科手术场景。通过引入全球可学习的素材和光异性视觉一致性损失,我们的方法产生任意观点的一致翻译,从而能够实现长期一致的视频合成。我们设计和测试我们的模型,以便从最小侵入性外科腹部镜中生成视频序列。由于贴标签的数据在这一领域往往有限,因此光真实性数据特别相关。通过将现有基于图像的方法推广到视觉一致的视频,我们的目标是影响模拟培训和评价环境对外科应用的适用性。代码和数据:http://opencas.dkfz.de/vicio-sime。