Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the lack of high-quality, labeled datasets. In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.
翻译:在高动态范围(HDR)和高速条件下,在挑战高动态范围(HDR)和高速度条件下获取准确的语义信息仍然是基于图像的算法的公开挑战。事件相机承诺应对这些挑战,因为它们具有更高的动态范围,并且具有运动模糊的弹性。然而,事件相机的语义分解仍然处于萌芽阶段,这主要是因为缺少高质量、标签化的数据集。在这项工作中,我们引入了ESS(基于静态的语义分解),它解决了这一问题,通过不受监督的域适应(UDA)直接将现有的标定图像数据集的语义分解任务转移到无标签的事件上。与现有的UDA方法相比,我们的方法将反复出现、运动性、运动性变化性的事件与嵌入图像嵌入的嵌入活动。因此,我们的方法既不需要视频数据,也不需要在图像和事件之间进行每平流动的校正校正。此外,我们引入了DECS-Sermantictal 、首次大型SISADA级的校正, 也同时展示了现有数字化的校正的校正。