This work presents a new depth- and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7$\%$ margin in terms of IoU.
翻译:本文提出一种新的深度和语义感知条件生成模型——TITAN-Next,用于激光雷达和相机传感器之间的多模态交叉领域图像转换。所提出的模型利用场景语义作为中层表示,通过仅依赖于语义场景部分将原始激光雷达点云翻译为RGB-D相机图像。我们声称这是其种类中的第一个框架,并且在自动驾驶汽车中具有实际应用,例如提供故障安全机制和增加目标图像领域中可用的数据。该所提出的模型在大规模且具有挑战性的Semantic-KITTI数据集上进行了评估,实验结果表明,在IoU的表现方面,它比原始的TITAN-Net 和其他强大的基线模型提高了23.7% 的优越性。