Event cameras record sparse illumination changes with high temporal resolution and high dynamic range. Thanks to their sparse recording and low consumption, they are increasingly used in applications such as AR/VR and autonomous driving. Current top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms, while event-aware methods do not perform as well. We propose Event Transformer+, that improves our seminal work evtprev EvT with a refined patch-based event representation and a more robust backbone to achieve more accurate results, while still benefiting from event-data sparsity to increase its efficiency. Additionally, we show how our system can work with different data modalities and propose specific output heads, for event-stream predictions (i.e. action recognition) and per-pixel predictions (dense depth estimation). Evaluation results show better performance to the state-of-the-art while requiring minimal computation resources, both on GPU and CPU.
翻译:活动相机记录了稀有的光化变化,其时间分辨率高,动态范围大。由于记录少,消耗量低,这些变化越来越多地用于AR/VR和自主驱动等应用。当前最高性能方法往往忽视特定事件数据特性,导致开发通用但计算成本昂贵的算法,而事件认知方法也不起作用。我们提议了“事件变异器+ ”, 改进我们原始工作evtprev EvT, 改进了基于补丁事件的代表性,并增强了骨干,以取得更准确的结果,同时仍然得益于事件数据宽敞,以提高效率。此外,我们展示了我们的系统如何使用不同的数据模式开展工作,并为事件流预测(即行动识别)和每像素预测(高级深度估计)提出具体产出头。 评估结果显示,在GPU和CPU上,我们最先进的工作表现更好,同时需要最低限度的计算资源。