Image outpainting technology generates visually plausible content regardless of authenticity, making it unreliable to be applied in practice. Thus, we propose a reliable image outpainting task, introducing the sparse depth from LiDARs to extrapolate authentic RGB scenes. The large field view of LiDARs allows it to serve for data enhancement and further multimodal tasks. Concretely, we propose a Depth-Guided Outpainting Network to model different feature representations of two modalities and learn the structure-aware cross-modal fusion. And two components are designed: 1) The Multimodal Learning Module produces unique depth and RGB feature representations from the perspectives of different modal characteristics. 2) The Depth Guidance Fusion Module leverages the complete depth modality to guide the establishment of RGB contents by progressive multimodal feature fusion. Furthermore, we specially design an additional constraint strategy consisting of Cross-modal Loss and Edge Loss to enhance ambiguous contours and expedite reliable content generation. Extensive experiments on KITTI and Waymo datasets demonstrate our superiority over the state-of-the-art method, quantitatively and qualitatively.
翻译:图像外观技术无论真实性如何,都会产生看似可信的内容,使得实际应用不可靠。 因此,我们提出一个可靠的图像外观任务,从激光成像仪中引入稀疏的深度以推断真实的 RGB 场景。 对激光成像仪的大外观使得它能够用于数据增强和进一步的多式联运任务。 具体地说,我们建议建立一个深度引导外观网络,以模拟两种模式的不同特征表现,并学习结构上注意到的跨模式聚合。 设计了两个组成部分:(1) 多式学习模块从不同模型特征的角度产生独特的深度和 RGB 特征表现。(2) 深度导导导模模块利用完整的深度模式,通过渐进式多式联运特性融合来指导RGB内容的建立。此外,我们特别设计了由跨式损失和电磁损失组成的额外制约战略,以加强模糊的轮廓,并加快可靠的内容生成。关于KITTI和Waymo数据集的广泛实验显示了我们优于最新方法,定量和定性。