Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it provides a concise and unified spatial representation across views and benefits a diverse set of downstream driving applications. While the focus has been placed on discriminative tasks such as BEV segmentation, the dual generative task of creating street-view images from a BEV layout has rarely been explored. The ability to generate realistic street-view images that align with a given HD map and traffic layout is critical for visualizing complex traffic scenarios and developing robust perception models for autonomous driving. In this paper, we propose BEVGen, a conditional generative model that synthesizes a set of realistic and spatially consistent surrounding images that match the BEV layout of a traffic scenario. BEVGen incorporates a novel cross-view transformation and spatial attention design which learn the relationship between cameras and map views to ensure their consistency. Our model can accurately render road and lane lines, as well as generate traffic scenes under different weather conditions and times of day. The code will be made publicly available.
翻译:近年来,人们日益关注鸟类-Eye View(BEV)概念,因为它为各种观点提供了简洁和统一的空间代表,并有益于一系列不同的下游驾驶应用。虽然重点已经放在诸如BEV分割等歧视性任务上,但很少探讨从BEV布局中制作街道-View图像的双重基因任务。制作符合特定HD地图和交通布局的现实街道-View图像的能力对于可视化复杂的交通情景和为自主驾驶开发稳健的感知模型至关重要。在本论文中,我们提出BEVGen,这是一个有条件的基因化模型,综合了一套符合BEV版交通布局的现实和空间一致的周围图像。BEVGen采用了一种全新的交叉视图转换和空间关注设计,以了解相机和地图视图之间的关系,确保它们的一致性。我们的模型可以准确地转换道路和车道线,并在不同的天气条件和时段下生成交通场景。该代码将被公诸于众。