In this report, we present our champion solutions to five tracks at Ego4D challenge. We leverage our developed InternVideo, a video foundation model, for five Ego4D tasks, including Moment Queries, Natural Language Queries, Future Hand Prediction, State Change Object Detection, and Short-term Object Interaction Anticipation. InternVideo-Ego4D is an effective paradigm to adapt the strong foundation model to the downstream ego-centric video understanding tasks with simple head designs. In these five tasks, the performance of InternVideo-Ego4D comprehensively surpasses the baseline methods and the champions of CVPR2022, demonstrating the powerful representation ability of InternVideo as a video foundation model. Our code will be released at https://github.com/OpenGVLab/ego4d-eccv2022-solutions
翻译:在本报告中,我们介绍了我们为 " Ego4D " 挑战的五条轨道提出的首选解决办法。我们利用我们开发的 " InternVideo " (一个视频基础模型),为 " Ego4D " 五项任务,包括 " 动画问答 " 、 " 自然语言问答 " 、 " 未来手预测 " 、 " 国家变化对象探测 " 和 " 短期物体互动预测 " 等,利用我们开发的 " InternVideo-Ego4D " (InternVideo-Ego4D)这一视频基础模型,来完成五项任务。在这五项任务中,InternVideo-Ego4D(InternVideo-Ego4D)的表现超过了基线方法和CVPR2022的冠军,展示了InternVideo作为视频基础模型的强大代表性能力。我们的代码将在https://github.com/OpenGLab/ego4d-eccv2022Solubs发布。