Scalable sensor simulation is an important yet challenging open problem for safety-critical domains such as self-driving. Current works in image simulation either fail to be photorealistic or do not model the 3D environment and the dynamic objects within, losing high-level control and physical realism. In this paper, we present GeoSim, a geometry-aware image composition process which synthesizes novel urban driving scenarios by augmenting existing images with dynamic objects extracted from other scenes and rendered at novel poses. Towards this goal, we first build a diverse bank of 3D objects with both realistic geometry and appearance from sensor data. During simulation, we perform a novel geometry-aware simulation-by-composition procedure which 1) proposes plausible and realistic object placements into a given scene, 2) render novel views of dynamic objects from the asset bank, and 3) composes and blends the rendered image segments. The resulting synthetic images are realistic, traffic-aware, and geometrically consistent, allowing our approach to scale to complex use cases. We demonstrate two such important applications: long-range realistic video simulation across multiple camera sensors, and synthetic data generation for data augmentation on downstream segmentation tasks. Please check https://tmux.top/publication/geosim/ for high-resolution video results.
翻译:对自我驾驶等安全关键领域来说,可缩放的传感器模拟是一个重要但具有挑战性的问题,对自我驾驶等安全关键领域来说,这是一个重要而又具有挑战性的开放问题。当前图像模拟工作要么没有光现实化,要么没有模拟3D环境及其内的各种动态物体,失去了高度控制和物理现实性。在本文件中,我们介绍了GeoSim,这是一个几何-能映像过程,它通过增加从其他场景提取的动态物体来综合新的城市驱动情景。为此,我们首先建立了一个由3D对象组成的多样化数据库,其中既有现实几何学,也有感官数据的外观。在模拟过程中,我们执行了一个新型的地理测量模拟逐位模拟程序,其中1)提出将合理和现实的物体放置在特定场景中,2)对资产库中的动态物体提出新的观点,3)对所制作的图像部分进行拼凑和混合。由此产生的合成图像是现实的,流量和几何相一致的,使我们得以对复杂的使用案例进行缩放。我们展示了两个如此重要的应用:在多个相机传感器传感器传感器传感器传感器传感器上进行远程现实的视频模拟,以及合成数据生成数据生成,用于数据扩增/多层分区分析。