Current advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs using the spike timing dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks, but STDP-based SNNs are still immature and their performance is far behind that of ANNs. In this work, we study the performance of SNNs when challenged with the task of human action recognition, because this task has many real-time applications in computer vision, such as video surveillance. In this paper we introduce a multi-layered 3D convolutional SNN model trained with unsupervised STDP. We compare the performance of this model to those of a 2D STDP-based SNN when challenged with the KTH and Weizmann datasets. We also compare single-layer and multi-layer versions of these models in order to get an accurate assessment of their performance. We show that STDP-based convolutional SNNs can learn motion patterns using 3D kernels, thus enabling motion-based recognition from videos. Finally, we give evidence that 3D convolution is superior to 2D convolution with STDP-based SNNs, especially when dealing with long video sequences.
翻译:然而,视频分析与传统的人工神经网络(ANNS)的计算成本相当高。 Spiking神经网络(SNNS)是第三代生物上可信的模型,处理以钉钉形式出现的信息。在本文中,我们引入了一个多层次的3D演动 SNNN模型,该模型与基于 2D 的STDP SNNN 模型相比,当基于 KTH 和 Weizmann 的视频数据集受到挑战时,我们将该模型的性能与基于 2D 的STDP SNNN 模型的性能进行比较。在这项工作中,当SNNN受到人类行动识别任务的挑战时,我们研究SNN的性能,因为SNNN网络在计算机视野中有许多实时应用,例如视频监视。在本文中,我们引入了一个多层次的3D 演动SNNNN 模型,在SD 3 视频集中,我们也可以用这种高级的高级动作来学习SDND 。