Birds-eye-view (BEV) semantic segmentation is critical for autonomous driving for its powerful spatial representation ability. It is challenging to estimate the BEV semantic maps from monocular images due to the spatial gap, since it is implicitly required to realize both the perspective-to-BEV transformation and segmentation. We present a novel two-stage Geometry Prior-based Transformation framework named GitNet, consisting of (i) the geometry-guided pre-alignment and (ii) ray-based transformer. In the first stage, we decouple the BEV segmentation into the perspective image segmentation and geometric prior-based mapping, with explicit supervision by projecting the BEV semantic labels onto the image plane to learn visibility-aware features and learnable geometry to translate into BEV space. Second, the pre-aligned coarse BEV features are further deformed by ray-based transformers to take visibility knowledge into account. GitNet achieves the leading performance on the challenging nuScenes and Argoverse Datasets. The code will be publicly available.
翻译:鸟类-眼视图(BEV)语义分解对于自主驱动其强大的空间代表能力至关重要。由于空间差距,从单视图像中估算BEV语义图具有挑战性,因为要实现视觉-视觉-视觉-视觉-视觉-视觉-视觉(BEV)转换和分解,就隐含了实现视觉-视觉-视觉转换和分解的要求。我们提出了一个名为GitNet(GitNet)的新颖的两阶段先入为主的先入为主的变形框架,由(一) 几何制导前对接和(二) 射线型变形器组成。在第一阶段,我们将BEV语系分解为视觉图像分解和前几何绘图,通过将BEV语义标签投射到图像平面以学习可见-觉特征和学习可转换为BEV空间的几何测量方法进行明确监督,因此具有挑战性。第二,前相相近的粗微变形变形变形变形变形器将进一步变形,以考虑可见性知识。 GitNet在有挑战的 nusc和Argoversnd Datasetsetset 上取得领先的成绩。该代码将公开提供。