The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption. However, the event data are noisy, sparse, and nonuniform in the spatial-temporal domain with an extremely high temporal resolution, making it challenging to design backend algorithms for event-based vision. Existing methods encode events into point-cloud-based or voxel-based representations, but suffer from noise and/or information loss. Additionally, there is little research that systematically studies how to handle static and dynamic scenes with one universal design for event-based vision. This work proposes the Aligned Event Tensor (AET) as a novel event data representation, and a neat framework called Event Frame Net (EFN), which enables our model for event-based vision under static and dynamic scenes. The proposed AET and EFN are evaluated on various datasets, and proved to surpass existing state-of-the-art methods by large margins. Our method is also efficient and achieves the fastest inference speed among others.
翻译:记录场景光学变化的神经形态事件相机,由于其高速和低能耗,引起人们越来越多的注意;然而,事件数据在空间时空空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间数据过于吵杂、稀少且不统一,造成设计基于事件的视觉后端算法的难度。现有方法将事件编码成点光谱或基于 voxel 的表达方式,但受到噪音和/或信息损失的影响。此外,很少有研究系统地研究如何处理静态和动态场景,对事件视觉进行一个通用设计。这项工作提议将 " 统一事件天线 " (AET)作为新的事件数据表示,并提出了一个称为 " 事件框架网 " 的清晰框架,这个框架使我们在静态和动态场景下能够以事件为基础的视觉模型。拟议的 " AET " 和 " EFN " 在各种数据集上得到评估,并证明大幅度超过现有的状态方法。我们的方法也是高效的,并且实现其他方面最快的推断速度。