承认行动行动的时间特点 (Learning Representative Temporal Features for Action Recognition)

In this paper, a novel video classification method is presented that aims to recognize different categories of third-person videos efficiently. Our motivation is to achieve a light model that could be trained with insufficient training data. With this intuition, the processing of the 3-dimensional video input is broken to 1D in temporal dimension on top of the 2D in spatial. The processes related to 2D spatial frames are being done by utilizing pre-trained networks with no training phase. The only step which involves training is to classify the 1D time series resulted from the description of the 2D signals. As a matter of fact, optical flow images are first calculated from consecutive frames and described by pre-trained CNN networks. Their dimension is then reduced using PCA. By stacking the description vectors beside each other, a multi-channel time series is created for each video. Each channel of the time series represents a specific feature and follows it over time. The main focus of the proposed method is to classify the obtained time series effectively. Towards this, the idea is to let the machine learn temporal features. This is done by training a multi-channel one dimensional Convolutional Neural Network (1D-CNN). The 1D-CNN learns the features along the only temporal dimension. Hence, the number of training parameters decreases significantly which would result in the trainability of the method on even smaller datasets. It is illustrated that the proposed method could reach the state-of-the-art results on two public datasets UCF11, jHMDB and competitive results on HMDB51.

翻译：本文展示了一种新型视频分类方法,目的是有效识别不同类别的第三人视频。我们的动机是实现一个光模型, 可以通过培训不足的数据进行培训。凭此直觉, 三维视频输入的处理在空间 2D 上方的时空层面破碎为1D。与 2D 空间框架相关的过程是通过使用未经培训阶段的预先培训网络完成的。唯一涉及培训的步骤是将来自2D 信号描述的 1D 时间序列进行分类。事实上, 光学流图像首先从连续的框架中计算, 并由经过预先训练的CNN 网络网络进行描述。然后, 光学流图像的维度会用CPA 来减少。通过将描述矢量矢量矢量的矢量叠叠在一起, 每个视频的多频道时间序列都会被打破。每个时间序列的每个频道都代表一个特定特点, 并随时间沿时间段进行。拟议方法的主要重点是将获得的时间序列有效地分类。为此, 设想让机器学习时间特征。这是通过培训多CN 一个连续的连续框架来计算, 并用预先训练。然后用CPR 网络的网络的两维度。通过使用 CPRN 的内网络将数据的大小将数据的大小的参数的大小的参数将沿着的大小数据学习的大小。 1D 数据将将的的的的的将的的以解解。