In this report, the method for the iqiyi submission to the task of ActivityNet 2019 Kinetics-700 challenge is described. Three models are involved in the model ensemble stage: TSN, HG-NL and StNet. We propose the hierarchical group-wise non-local (HG-NL) module for frame-level features aggregation for video classification. The standard non-local (NL) module is effective in aggregating frame-level features on the task of video classification but presents low parameters efficiency and high computational cost. The HG-NL method involves a hierarchical group-wise structure and generates multiple attention maps to enhance performance. Basing on this hierarchical group-wise structure, the proposed method has competitive accuracy, fewer parameters and smaller computational cost than the standard NL. For the task of ActivityNet 2019 Kinetics-700 challenge, after model ensemble, we finally obtain an averaged top-1 and top-5 error percentage 28.444% on the test set.
翻译:在本报告中,介绍了提交 " iqiyi " 提交 " ApplicNet 2019动因-700 " 任务的方法。三种模型涉及模型组合阶段:SSN、HG-NL和StNet。我们建议为视频分类提供框架级非本地(HG-NL)特征集合的等级级组(HG-NL)模块。标准非本地(NL)模块有效地综合了视频分类任务的框架级特征,但提供了低参数效率和高计算成本。HG-NL方法涉及一个等级分组结构,并生成了多位关注地图,以提高性能。在这种等级组合结构上,拟议方法比标准NL具有竞争性的准确性、较少参数和较小的计算成本。对于2019 动因技术-700的任务,在模型组合后,我们终于在测试集中获得了平均上层-1和上层-5误差百分率为28.444 %。