We introduce a novel architecture, UniCal, for Camera-to-LiDAR (C2L) extrinsic calibration which leverages self-attention mechanisms through a Transformer-based backbone network to infer the 6-degree of freedom (DoF) relative transformation between the sensors. Unlike previous methods, UniCal performs an early fusion of the input camera and LiDAR data by aggregating camera image channels and LiDAR mappings into a multi-channel unified representation before extracting their features jointly with a single-branch architecture. This single-branch architecture makes UniCal lightweight, which is desirable in applications with restrained resources such as autonomous driving. Through experiments, we show that UniCal achieves state-of-the-art results compared to existing methods. We also show that through transfer learning, weights learned on the calibration task can be applied to a calibration validation task without re-training the backbone.
翻译:我们引入了一种新的架构UniCal,用于相机-激光雷达(C2L)外参标定,通过基于Transformer的骨干网络利用自注意力机制来推断传感器之间的6自由度(DoF)相对变换。与之前的方法不同的是,UniCal将输入的相机和激光雷达数据提前融合,将相机图像通道和激光雷达映射聚合成一个多通道统一表示,在使用单分支架构一起提取它们的特征。这个单分支架构使UniCal轻量级,这在资源受限的应用中是可取的,例如自动驾驶。通过实验证明,UniCal相比现有方法具有最先进的结果。我们还展示了通过迁移学习,可以将在标定任务上学习的权重应用于标定验证任务而不需要重新训练骨干网络。