This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applications since important perception tasks, such as semantic segmentation, become very challenging within the blind zone. Previous works considered the out-FoV outpainting and in-FoV segmentation separately. However, we observe that these two tasks are actually closely coupled. To jointly estimate the tightly intertwined complete fisheye image and scene semantics, we introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation while considering different polar distributions. In addition to the contribution of the novel task and architecture, we also derive Cityscapes-BF and KITTI360-BF datasets to facilitate training and evaluation of this new track. Our experiments demonstrate that the proposed FishDreamer outperforms methods solving each task in isolation and surpasses alternative approaches on the Fisheye Semantic Completion. Code and datasets are publicly available at https://github.com/MasterHow/FishDreamer.
翻译:本文提出了一项新任务:鱼眼语义补全(FSC),在该任务中,即使超出了传感器视场(FoV),也可以推断出鱼眼图像的密集纹理、结构和语义。鱼眼相机的视场比普通针孔相机大,但其独特的特殊成像模型自然导致图像平面边缘出现盲区。这对于关键的安全应用程序是不理想的,因为重要的感知任务,如语义分割,在盲区内变得非常具有挑战性。先前的工作分别考虑了盲区外外推和视场内分割。但我们观察到,这两个任务实际上是密切相关的。为了同时估计紧密交织的完整鱼眼图像和场景语义,我们引入了新的FishDreamer,它依赖增强了极地感知交叉关注模块(PCA)的成功ViTs来利用稠密上下文,并在考虑不同的极地分布时指导具有语义一致性的内容生成。除了新任务和架构的贡献外,我们还导出了Cityscapes-BF和KITTI360-BF数据集,以促进这种新轨道的训练和评估。我们的实验表明,所提出的FishDreamer优于分别解决每个任务的方法,并超过了其他方法在鱼眼语义补全上的表现。代码和数据集公开在https://github.com/MasterHow/FishDreamer。