This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applications since important perception tasks, such as semantic segmentation, become very challenging within the blind zone. Previous works considered the out-FoV outpainting and in-FoV segmentation separately. However, we observe that these two tasks are actually closely coupled. To jointly estimate the tightly intertwined complete fisheye image and scene semantics, we introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation while considering different polar distributions. In addition to the contribution of the novel task and architecture, we also derive Cityscapes-BF and KITTI360-BF datasets to facilitate training and evaluation of this new track. Our experiments demonstrate that the proposed FishDreamer outperforms methods solving each task in isolation and surpasses alternative approaches on the Fisheye Semantic Completion. Code and datasets will be available at https://github.com/MasterHow/FishDreamer.
翻译:本文提出了一种新的任务,即鱼眼镜头语义补全(Fisheye Semantic Completion,FSC),其通过推断出Fish-eye 图像的密集纹理、结构和语义,即使超出了传感器视场。鱼眼镜头的视场大于普通针孔相机,但其独特的特殊成像模型自然导致在图像平面边缘存在盲区。这对于安全关键的应用而言是次优的,因为重要的感知任务,如语义分割,在盲区内变得非常具有挑战性。以往的工作独立地考虑了出视场图像修补和视场内分割。但是,我们观察到这两个任务实际上是密切相关的。为了联合估计紧密交织的完整鱼眼图像和场景语义,我们引入了新的FishDreamer,它依赖于成功的ViTs,并加入了新颖的极轴意识交叉注意力模块(PCA),以利用密集的上下文并在考虑不同的极轴分布的情况下指导语义一致的内容生成。除了提出新任务和架构的贡献外,我们还推导出Cityscapes-BF和KITTI360-BF数据集,以便用于新跟踪的训练和评估。我们的实验表明,所提出的FishDreamer优于单独解决每个任务的方法,并在Fisheye Semantic Completion的替代方法上取得了更好的效果。代码和数据集可在https://github.com/MasterHow/FishDreamer获得。