Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction. We use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection and multi-modal feature representation to help the prediction of relation between objects. Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which surpasses other methods by a large margin.
翻译:视频关系探测问题是指探测视频中不同对象之间的关系,例如空间关系和动作关系。本文介绍与轨迹认知多模式特征的视频关系探测,以解决这一问题。考虑到在视频中进行视觉关系探测的复杂性,我们将此任务分解为三个子任务:物体探测、轨迹建议和关系预测。我们使用最先进的天体探测方法确保天体轨迹探测的准确性和多模式特征描述,以帮助预测天体之间的关系。我们的方法在2020年AMM多媒体视频视频连接理解大挑战视频关系探测任务中赢得了第一位,即11.74 ⁇ mAP,大大超过其他方法。