Semantic grids can be useful representations of the scene around an autonomous system. By having information about the layout of the space around itself, a robot can leverage this type of representation for crucial tasks such as navigation or tracking. By fusing information from multiple sensors, robustness can be increased and the computational load for the task can be lowered, achieving real time performance. Our multi-scale LiDAR-Aided Perspective Transform network uses information available in point clouds to guide the projection of image features to a top-view representation, resulting in a relative improvement in the state of the art for semantic grid generation for human (+8.67%) and movable object (+49.07%) classes in the nuScenes dataset, as well as achieving results close to the state of the art for the vehicle, drivable area and walkway classes, while performing inference at 25 FPS.
翻译:语义网格可以对自主系统周围的场景进行有用的描述。 通过掌握关于周围空间布局的信息,机器人可以利用这种类型的代表方式完成导航或跟踪等关键任务。 通过从多个传感器冻结信息,可以提高稳健性,降低任务计算负荷,实现实时性能。我们的多尺度LiDAR辅助视野变换网络使用点云中可用的信息,引导图像特征投向上视图示,从而相对改进了nuScenes数据集中人类语系网格生成(+8.67%)和动产(+49.07%)级的艺术状态,并在25个FPS中进行推断,同时取得接近车辆、可钻地和行走道级的艺术状态的结果。