潜入鸟的眼视观念的恶魔:回顾、评价和食谱 (Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe)

Hongyang Li,Chonghao Sima,Jifeng Dai,Wenhai Wang,Lewei Lu,Huijie Wang,Enze Xie,Zhiqi Li,Hanming Deng,Hao Tian,Xizhou Zhu,Li Chen,Yulu Gao,Xiangwei Geng,Jia Zeng,Yang Li,Jiazhi Yang,Xiaosong Jia,Bohan Yu,Yu Qiao,Dahua Lin,Si Liu,Junchi Yan,Jianping Shi,Ping Luo

Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. BEV perception inherits several advantages, as representing surrounding scenes in BEV is intuitive and fusion-friendly; and representing objects in BEV is most desirable for subsequent modules as in planning and/or control. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; (c) how to formulate the pipeline to incorporate features from different sources and views; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios. In this survey, we review the most recent work on BEV perception and provide an in-depth analysis of different solutions. Moreover, several systematic designs of BEV approach from the industry are depicted as well. Furthermore, we introduce a full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs. At last, we point out the future research directions in this area. We hope this report would shed some light on the community and encourage more research effort on BEV perception. We keep an active repository to collect the most recent work and provide a toolbox for bag of tricks at https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe.

翻译：在鸟类眼视(BEV)中学习感知任务方面的强大表现正在趋向并引起产业界和学术界的广泛关注。大多数自主驱动算法的常规方法从正面或视角进行检测、分割、跟踪等。随着传感器配置变得更加复杂,将不同传感器的多源信息整合起来,在统一观点中代表特征具有至关重要的意义。BEV的感知继承了若干优点,因为BEV的周围场景代表直观和融合友好;BEV的物体代表着随后的模块最为可取,例如在规划和/或控制中。BEV的感知核心问题在于:(a) 如何通过从视角到BEV的转变来重建丢失的三维信息;(b) 如何在BEV的网格中获取地面真相说明;(c) 如何设计管道以纳入不同源和观点的特征;以及(d) 如何适应和普及算法,因为传感器配置不同情景不同。在本次调查中,我们审查最近关于感知/或控制中的BEV概念工作,并提供对不同解决方案的深入分析。此外,一些系统设计BEV的系统设计工作正在改进B的进度,我们正在改进了BVEV的工作。