This paper presents our solution to the AVA-Kinetics Crossover Challenge of ActivityNet workshop at CVPR 2021. Our solution utilizes multiple types of relation modeling methods for spatio-temporal action detection and adopts a training strategy to integrate multiple relation modeling in end-to-end training over the two large-scale video datasets. Learning with memory bank and finetuning for long-tailed distribution are also investigated to further improve the performance. In this paper, we detail the implementations of our solution and provide experiments results and corresponding discussions. We finally achieve 40.67 mAP on the test set of AVA-Kinetics.
翻译:本文件介绍了我们对2021年CVPR活动网络AVA-Kinetics交叉挑战讲习班的解决办法。我们的解决办法利用多种关系模型方法探测时空行动,并采取了一项培训战略,将多种关系模型纳入两个大型视频数据集的端到端培训中。还调查了与记忆库学习和对长期分发进行微调以进一步改善业绩。我们在本文件中详细介绍了我们的解决办法的实施情况,并提供了实验结果和相应的讨论结果。我们最终在AVA-Kinetics测试集上实现了40.67万帕。