In this paper, a novel video classification methodology is presented that aims to recognize different categories of third-person videos efficiently. The idea is to keep track of motion in videos by following optical flow elements over time. To classify the resulted motion time series efficiently, the idea is letting the machine to learn temporal features along the time dimension. This is done by training a multi-channel one dimensional Convolutional Neural Network (1D-CNN). Since CNNs represent the input data hierarchically, high level features are obtained by further processing of features in lower level layers. As a result, in the case of time series, long-term temporal features are extracted from short-term ones. Besides, the superiority of the proposed method over most of the deep-learning based approaches is that we only try to learn representative temporal features along the time dimension. This reduces the number of learning parameters significantly which results in trainability of our method on even smaller datasets. It is illustrated that the proposed method could reach state-of-the-art results on two public datasets UCF11 and jHMDB with the aid of a more efficient feature vector representation.
翻译:本文介绍了一种新的视频分类方法,目的是有效识别不同类别的第三人视频,目的是通过跟踪光学流元素,跟踪视频中运动的动态。为了高效地对运动时间序列进行分类,设想是让机器沿时间维度学习时间特征。这是通过培训多频道一维进化神经网络(1D-CNN)来完成的。由于CNN代表了输入数据,因此通过进一步处理较低层次的特征获得了高层次特征。因此,在时间序列中,从短期序列中提取长期时间特征。此外,拟议方法优于大多数基于深层学习的方法,我们只是试图沿时间维度学习具有代表性的时间特征。这大大减少了学习参数的数量,这些参数导致我们的方法在更小的数据集上具有培训性。可以说明,拟议的方法可以在两个公共数据集UCF11和jHMDB上达到最先进的结果,同时有助于更高效的特征矢量矢量说明。