Autonomous systems require a continuous and dependable environment perception for navigation and decision-making, which is best achieved by combining different sensor types. Radar continues to function robustly in compromised circumstances in which cameras become impaired, guaranteeing a steady inflow of information. Yet, camera images provide a more intuitive and readily applicable impression of the world. This work combines the complementary strengths of both sensor types in a unique self-learning fusion approach for a probabilistic scene reconstruction in adverse surrounding conditions. After reducing the memory requirements of both high-dimensional measurements through a decoupled stochastic self-supervised compression technique, the proposed algorithm exploits similarities and establishes correspondences between both domains at different feature levels during training. Then, at inference time, relying exclusively on radio frequencies, the model successively predicts camera constituents in an autoregressive and self-contained process. These discrete tokens are finally transformed back into an instructive view of the respective surrounding, allowing to visually perceive potential dangers for important tasks downstream.
翻译:自动系统需要持续和可靠的导航和决策环境感知,而最佳办法是将不同传感器类型结合起来。雷达继续在摄影机受损的失密环境中运行,保证信息源源不断流入。然而,摄像图像提供了一种更直观和易于应用的世界印象。这项工作结合了两种传感器类型的互补优势,采用独特的自我学习混合方法,在周围不利条件下重建概率性场景。在通过分解的随机自我监督压缩技术减少两种高维测量的记忆要求之后,拟议的算法在培训期间利用了不同功能层次的相似之处并在两个领域之间建立了对应关系。然后,在推断时,该模型完全依靠无线电频率,连续地预测了自动侵蚀和自成一体过程中的摄像成分。这些离散的符号最终被转化成对周围环境的启发性观点,从而能够对下游重要任务的潜在危险进行视觉感知。