Semantic grids are a useful representation of the environment around a robot. They can be used in autonomous vehicles to concisely represent the scene around the car, capturing vital information for downstream tasks like navigation or collision assessment. Information from different sensors can be used to generate these grids. Some methods rely only on RGB images, whereas others choose to incorporate information from other sensors, such as radar or LiDAR. In this paper, we present an architecture that fuses LiDAR and camera information to generate semantic grids. By using the 3D information from a LiDAR point cloud, the LiDAR-Aided Perspective Transform Network (LAPTNet) is able to associate features in the camera plane to the bird's eye view without having to predict any depth information about the scene. Compared to state-of-theart camera-only methods, LAPTNet achieves an improvement of up to 8.8 points (or 38.13%) over state-of-art competing approaches for the classes proposed in the NuScenes dataset validation split.
翻译:语义网格是机器人周围环境的有用表示。 它们可以用于自动车辆, 以精确地代表汽车周围的景象, 捕捉下游任务( 如导航或碰撞评估) 的重要信息。 来自不同传感器的信息可用于生成这些网格 。 有些方法仅依赖于 RGB 图像, 而另一些方法则选择纳入来自其他传感器( 如雷达或激光雷达) 的信息 。 在本文中, 我们展示了一个将 LiDAR 和相机信息结合起来以生成语义网格的架构。 通过使用来自 LiDAR 点云的 3D 信息, LiDAR 辅助视野变换网络( LAPTNet) 可以将摄像机上的功能与鸟眼视图联系起来, 而不需预测任何关于现场的深度信息 。 与最先进的只摄像器方法相比, LAPTNet 在 Nuscenes 数据集验证分割中提议的类中, 将最先进的竞合方法改进到8. 8 点( 或38. 13% ) 。