Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms. Efforts toward efficient solutions usually do not achieve top-accuracy results for complex tasks. This work proposes a novel framework, Event Transformer (EvT), that effectively takes advantage of event-data properties to be highly efficient and accurate. We introduce a new patch-based event representation and a compact transformer-like architecture to process it. EvT is evaluated on different event-based benchmarks for action and gesture recognition. Evaluation results show better or comparable accuracy to the state-of-the-art while requiring significantly less computation resources, which makes EvT able to work with minimal latency both on GPU and CPU.
翻译:活动摄像机是许多在资源贫乏和具有挑战性的环境中运行的应用程序非常感兴趣的感应器,它们以高时间分辨率和高动态范围对稀有的照明变化进行记录,虽然它们显示的是最小的电能消耗。然而,顶级性能方法往往忽视特定事件数据特性,导致开发通用但计算成本昂贵的算法。高效解决方案通常不能为复杂任务取得最准确的结果。这项工作提出了一个新颖的框架,即事件变换器(EvT),它有效地利用事件数据特性来提高效率和准确性。我们推出了一个新的补丁事件代表器和类似压缩变压器的结构来进行处理。根据不同事件基准对EvT进行评估,以采取行动和动作识别。评价结果显示与最新工艺的准确性更好或可比,同时要求大大降低计算资源,从而使EvT能够在 GPU 和 CPU 上以最小的惯性工作。