This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7$\%$ margin in terms of IoU.
翻译:这项工作提出了一个新的深度和语义性有条件基因模型,名为TITAN-Next,用于在LiDAR传感器和摄像传感器之间的多模式装置中跨界图像到图像翻译,拟议的模型将现场语义作为中层代表,并能够将原始的LIDAR点云转化为 RGB-D 相机图像,仅依靠语义场景部分即可。我们声称,这是这种类型的第一个框架,在自主载体中具有实际应用,例如提供故障安全机制和在目标图像域中增加现有数据。拟议的模型在大型和具有挑战性的Smantic-KITTI数据集上进行了评价,实验结果显示,就IoU而言,它大大超过原TITAN-Net和其他强基线23.7美元差值。