The neuromorphic spike camera generates data streams with high temporal resolution in a bio-inspired way, which has vast potential in the real-world applications such as autonomous driving. In contrast to RGB streams, spike streams have an inherent advantage to overcome motion blur, leading to more accurate depth estimation for high-velocity objects. However, training the spike depth estimation network in a supervised manner is almost impossible since it is extremely laborious and challenging to obtain paired depth labels for temporally intensive spike streams. In this paper, instead of building a spike stream dataset with full depth labels, we transfer knowledge from the open-source RGB datasets (e.g., KITTI) and estimate spike depth in an unsupervised manner. The key challenges for such problem lie in the modality gap between RGB and spike modalities, and the domain gap between labeled source RGB and unlabeled target spike domains. To overcome these challenges, we introduce a cross-modality cross-domain (BiCross) framework for unsupervised spike depth estimation. Our method narrows the enormous gap between source RGB and target spike by introducing the mediate simulated source spike domain. To be specific, for the cross-modality phase, we propose a novel Coarse-to-Fine Knowledge Distillation (CFKD), which transfers the image and pixel level knowledge from source RGB to source spike. Such design leverages the abundant semantic and dense temporal information of RGB and spike modalities respectively. For the cross-domain phase, we introduce the Uncertainty Guided Mean-Teacher (UGMT) to generate reliable pseudo labels with uncertainty estimation, alleviating the shift between the source spike and target spike domains. Besides, we propose a Global-Level Feature Alignment method (GLFA) to align the feature between two domains and generate more reliable pseudo labels.
翻译:神经畸形钉钉相机以生物启发的方式生成高时间分辨率的数据流,这在诸如自主驱动等现实世界应用中具有巨大的潜力。 与 RGB 流相比, 尖刺流具有克服运动模糊的内在优势, 导致高速天体更精确的深度估计。 然而, 以监督的方式培训尖刺深度估计网络几乎是不可能的, 因为要为时间密集的尖刺流获得双向深度标签是非常困难和困难的。 在本文中, 我们不是用完全深度标签来建立尖刺流数据集, 而是从开放源 RGB 数据集( 例如, KITTI) 中传授知识, 并且以不受监督的方式估算尖刺深度。 这一问题的关键挑战在于 RGB 和加压模式之间的模式差距, 以及标签源 RGB 和未加标签的目标点点点之间的域间差距。 为了克服这些挑战, 我们引入了一个跨模式的跨模式( BCRGB 和 RB 目标值 ), 以引入了精确的 RGB 和 RBD 数据 水平, 和 数据 版本的变码, 以模拟的 IM 版本的 版本的 RGB 版本的 数据源际的 。