With information from multiple input modalities, sensor fusion-based algorithms usually out-perform their single-modality counterparts in robotics. Camera and LIDAR, with complementary semantic and depth information, are the typical choices for detection tasks in complicated driving environments. For most camera-LIDAR fusion algorithms, however, the calibration of the sensor suite will greatly impact the performance. More specifically, the detection algorithm usually requires an accurate geometric relationship among multiple sensors as the input, and it is often assumed that the contents from these sensors are captured at the same time. Preparing such sensor suites involves carefully designed calibration rigs and accurate synchronization mechanisms, and the preparation process is usually done offline. In this work, a segmentation-based framework is proposed to jointly estimate the geometrical and temporal parameters in the calibration of a camera-LIDAR suite. A semantic segmentation mask is first applied to both sensor modalities, and the calibration parameters are optimized through pixel-wise bidirectional loss. We specifically incorporated the velocity information from optical flow for temporal parameters. Since supervision is only performed at the segmentation level, no calibration label is needed within the framework. The proposed algorithm is tested on the KITTI dataset, and the result shows an accurate real-time calibration of both geometric and temporal parameters.
翻译:具有多种输入模式的信息, 以传感器聚合为基础的算法通常比机器人中的单式算法的单式算法更优于机器人中的单式对应方。 相机和LIDAR,加上补充的语义和深度信息,是复杂驾驶环境中探测任务的典型选择。 然而,对于大多数相机- LIDAR聚合算法来说, 传感器套件的校准将大大影响性能。 更具体地说, 检测算法通常要求多个传感器与输入一样的精确几何关系, 并且通常假设这些传感器的内容会同时被捕获。 设计这类传感器套件需要精心设计的校准装置和精确同步机制, 并且通常在离线时完成准备过程。 在这项工作中, 提议一个基于分解的框架, 共同估计相机- LIDAR 组合校准中的几何参数和时间参数。 一个语义分解的分解掩码遮罩首先应用于两种传感器模式, 校准参数通过像精准双向双向双向的双向损失得到优化。 我们专门将光学流中的速度信息纳入时间参数中。 由于仅在分解级别上进行监督, 并且不要求进行校准的校准的校准的校准。