We present a new deep learning approach for real-time 3D human action recognition from skeletal data and apply it to develop a vision-based intelligent surveillance system. Given a skeleton sequence, we propose to encode skeleton poses and their motions into a single RGB image. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the color images to enhance their local patterns and generate more discriminative features. For learning and classification tasks, we design Deep Neural Networks based on the Densely Connected Convolutional Architecture (DenseNet) to extract features from enhanced-color images and classify them into classes. Experimental results on two challenging datasets show that the proposed method reaches state-of-the-art accuracy, whilst requiring low computational time for training and inference. This paper also introduces CEMEST, a new RGB-D dataset depicting passenger behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic normal and anomalous events. We achieve promising results on real conditions of this dataset with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing monitoring and security in public transport.
翻译:我们从骨骼数据中为实时的 3D 人类行动识别提供了一个新的深层次学习方法,并应用它来开发一个基于视觉的智能监视系统。根据一个骨架序列,我们提议将骨架及其动作编码成一个 RGB 图像。然后,对彩色图像应用适应性直方图平准算法,以强化其本地模式并产生更具歧视性的特征。为了学习和分类任务,我们设计了深神经网络,以“密集连接的革命建筑”(TheseNet)为基础,从强化的彩色图像中提取特征并将其分类。两个具有挑战性的数据集的实验结果显示,拟议方法达到了最先进的精确度,同时需要低的计算时间进行培训和推断。本文还介绍了CEMEST,一个新的 RGB-D数据集,描述公共交通中的乘客行为。它由203个没有断线的、真实世界监测视频组成,反映现实正常和异常事件。我们利用数据增强和转移学习技术,在实际条件下取得了有希望的结果。这两份具有挑战性的数据集的实验结果显示,在加强公共安全和深层学习的基础上构建现实世界。