To address the problem of training on small datasets for action recognition tasks, most prior works are either based on a large number of training samples or require pre-trained models transferred from other large datasets to tackle overfitting problems. However, it limits the research within organizations that have strong computational abilities. In this work, we try to propose a data-efficient framework that can train the model from scratch on small datasets while achieving promising results. Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream (Rank Pooling RGB and Optical Flow) framework for the task. The method is validated on the action recognition track of the ECCV 2020 VIPriors challenges and got the 2nd place (88.31%). It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets. The code will be released soon.
翻译:为解决行动识别任务小型数据集培训问题,大多数先前的工作要么基于大量培训样本,要么需要事先培训的模型,从其他大型数据集中转让,以解决过于适应的问题。然而,这限制了具有强大计算能力的组织内部的研究。在这项工作中,我们试图提出一个数据高效框架,从零开始对小型数据集进行模型培训,同时取得有希望的结果。具体地说,通过引入3D中央差异变换操作,我们为这项任务提出了一个新的C3D神经网络双流(Rank Pooling RGB和光学流动)框架。该方法在ECCV 2020VIPriors挑战的行动识别轨道上得到验证,并获得了第2位(88.31%)。事实证明,即使没有大规模数据集的预先培训模型,我们的方法也能够取得大有希望的结果。该代码将很快发布。