Enabling artificial neural networks (ANNs) to have temporal understanding in visual tasks is an essential requirement in order to achieve complete perception of video sequences. A wide range of benchmark datasets is available to allow for the evaluation of such capabilities when using conventional frame-based video sequences. In contrast, evaluating them for systems targeting neuromorphic data is still a challenge due to the lack of appropriate datasets. In this work we define a new benchmark task for action recognition in event-based video sequences, DVS-Gesture-Chain (DVS-GC), which is based on the temporal combination of multiple gestures from the widely used DVS-Gesture dataset. This methodology allows to create datasets that are arbitrarily complex in the temporal dimension. Using our newly defined task, we evaluate the spatio-temporal understanding of different feed-forward convolutional ANNs and convolutional Spiking Neural Networks (SNNs). Our study proves how the original DVS Gesture benchmark could be solved by networks without temporal understanding, unlike the new DVS-GC which demands an understanding of the ordering of events. From there, we provide a study showing how certain elements such as spiking neurons or time-dependent weights allow for temporal understanding in feed-forward networks without the need for recurrent connections. Code available at: https://github.com/VicenteAlex/DVS-Gesture-Chain
翻译:使人造神经网络(ANNS)能够对视觉任务有时间理解,这是实现视频序列完整感知的基本要求。 使用常规基底视频序列时,可以使用多种基准数据集来评估这种能力。 相反,由于缺乏适当的数据集,对针对神经形态数据的系统进行评估仍是一项挑战。 在这项工作中,我们为基于事件的视频序列中的行动识别确定了一个新的基准任务, DVS- Gesture- Chain (DVS-GC),该基准数据集基于广泛使用的 DVS- Gesture数据集的多个手势的时间组合。 这种方法允许在使用传统基底视频序列时创建任意复杂的数据集。 使用我们新定义的任务,我们评估对不同进取进进进进进进进进进进进进进进进进的NNNNS和进进化的神经网络(SNNNSs)的时空理解。 我们的研究证明, DVS Gesturereture的基准可以通过没有时间理解的网络来解决问题, 不同于新的DVS-D-D- A- remain reful reful reful requial conneal const commessationsmess for wemauds fol- we supol- fol- demaismess for smess for surview surviews