We present our three branch solutions for International Challenge on Activity Recognition at CVPR2019. This model seeks to fuse richer information of global video clip, short human attention and long-term human activity into a unified model. We have participated in two tasks: Task A, the Kinetics challenge and Task B, spatio-temporal action localization challenge. For Kinetics, we achieve 21.59% error rate. For the AVA challenge, our final model obtains 32.49% mAP on the test sets, which outperforms all submissions to the AVA challenge at CVPR 2018 for more than 10% mAP. As the future work, we will introduce human activity knowledge, which is a new dataset including key information of human activity.
翻译:在CVPR2019上,我们提出了关于活动识别国际挑战的三个分支解决方案。这个模型试图将全球视频短片、短期人类注意力和长期人类活动的更丰富信息整合到一个统一的模型中。我们参与了两项任务:任务A,动因挑战与任务B,时空行动定位挑战。对于动因来说,我们达到了21.59%的误差率。对于AVA的挑战,我们的最后模型在测试中获得了32.49%的 mAP,这比在2018年CVPR挑战中提交AVA挑战的所有文件都高出10%以上。作为未来工作,我们将引入人类活动知识,这是一个包含人类活动关键信息的新的数据集。