Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors have tremendous potential for fast autonomous or remote-controlled semantic scene analysis, e.g., for disaster examination. In this work, we propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer using lightweight CNN architectures and embedded inference accelerators. We follow a late fusion approach where semantic information from multiple modalities augments 3D point clouds and image segmentation masks while also generating an allocentric semantic map. Our system provides augmented semantic images and point clouds with $\approx\,$9$\,$Hz. We evaluate the integrated system in real-world experiments in an urban environment.
翻译:配备多种辅助传感器的无人驾驶飞行器(无人驾驶飞行器)具有快速自主或遥控语义场景分析的巨大潜力,例如用于灾害检查。在这项工作中,我们提议建立一个用于实时语义推断和融合多种传感器模式的无人驾驶飞行器系统。LiDAR扫描和RGB图像的语义分割以及RGB和热图像上的物体探测,利用轻量CNN结构在无人驾驶航空器计算机上运行,并嵌入推力加速器。我们采用了一种较晚的聚合方法,从多种模式获得的语义信息可增强3D点云和图像分割面,同时生成一个全方位语义语义图。我们的系统以$approx\,$9$\hz提供强化语义图像和点云,用$\approx\,$9\hz。我们评估在城市环境中进行的实际实验的综合系统。