In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increase in mIoU over SOTA by 3.7 percentage points and a highly competitive FID. Furthermore, we analyse the quality of the generated data as training data for semantic segmentation and depth completion, and show that our approach is more suited for this purpose than other methods.
翻译:在本文中,我们提出了一个合成图像合成方法,该方法生成了RGB画布,配有对齐的分层图和稀薄的深度图,以及一个将RGB画布转化为高质量RGB图像的油漆网络,并将稀少的深度图转化为像素密度密度深的深度图。我们从结构对齐和图像质量的角度来衡量我们的方法,表明MIOU比SOTA增加了3.7个百分点,并且具有高度竞争力的FID。此外,我们分析生成的数据的质量,作为用于语义分层和深度完成的培训数据,并表明我们的方法比其他方法更适合这一目的。