Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level intensity changes in the form of "events". The innovative way they acquire data presents several advantages over standard devices, especially in poor lighting and high-speed motion conditions. However, the novelty of these sensors results in the lack of a large amount of training data capable of fully unlocking their potential. The most common approach implemented by researchers to address this issue is to leverage simulated event data. Yet, this approach comes with an open research question: how well simulated data generalize to real data? To answer this, we propose to exploit, in the event-based context, recent Domain Adaptation (DA) advances in traditional computer vision, showing that DA techniques applied to event data help reduce the sim-to-real gap. To this purpose, we propose a novel architecture, which we call Multi-View DA4E (MV-DA4E), that better exploits the peculiarities of frame-based event representations while also promoting domain invariant characteristics in features. Through extensive experiments, we prove the effectiveness of DA methods and MV-DA4E on N-Caltech101. Moreover, we validate their soundness in a real-world scenario through a cross-domain analysis on the popular RGB-D Object Dataset (ROD), which we extended to the event modality (RGB-E).
翻译:活动摄像机是新颖的由生物启发的传感器,它们以“活动”的形式不时捕捉像素级强度变化的像素级感应器。它们获取数据的创新方式比标准设备具有若干优势,特别是在光亮差和高速运动条件下。然而,这些传感器的新颖性导致缺乏大量能够充分释放其潜力的培训数据。研究人员为解决这一问题而采用的最常见方法是利用模拟事件数据。然而,这一方法带来了一个公开的研究问题:模拟数据对真实数据的概括性有多好?为了回答这个问题,我们提议在基于事件的情况下利用传统计算机视觉的最新多域适应(DA)进展,表明将DA技术应用于事件有助于缩小其真实差距。为此,我们提议了一个新结构,我们称之为多维维DA4E(MV-DA4E(MV-DA4E)(MV-DA4E) (MVD-D-DAD) (MV-D-DER-RGB-C-C-C-C-C-CRUF-C-C-C-CRGB-C-C-CRUD-C-IG-IG-C-C-IG-C-C-C-C-C-C-C-C-C-C-C-C-C-ILVLA-C-C-C-C-C-C-C-C-C-C-IF-IF-IF-C-C-C-C-IF-IF-IF-IF-IF-IF-C-C-C-C-C-C-C-C-IF-C-IF-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-IF-IF-IF-IF-IF-IF-IF-IF-IF-IF-IF-C-C-C-C-C-C-C-C-IF-IF-IF-IF-IF-IF-I-IF-IF-IF-IF-IF-IF-IF-IF-C-I-IF-IF-C-I-I-