Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a human is interacting with objects, while multiple objects appear around the human differently over time. Most existing action recognition models focus on learning overall visual cues of a scene but disregard informative fine-grained features, which can be captured by learning human-object relationships and interactions. In this paper, we learn human-object relationships by exploiting the interaction of their local and global contexts. We hence propose the Global-Local Interaction Distillation Network (GLIDN), learning human and object interactions through space and time via knowledge distillation for fine-grained scene understanding. GLIDN encodes humans and objects into graph nodes and learns local and global relations via graph attention network. The local context graphs learn the relation between humans and objects at a frame level by capturing their co-occurrence at a specific time step. The global relation graph is constructed based on the video-level of human and object interactions, identifying their long-term relations throughout a video sequence. More importantly, we investigate how knowledge from these graphs can be distilled to their counterparts for improving human-object interaction (HOI) recognition. We evaluate our model by conducting comprehensive experiments on two datasets including Charades and CAD-120 datasets. We have achieved better results than the baselines and counterpart approaches.
翻译:模拟时空关系对于认识人类行动至关重要,特别是当人类与物体发生互动时,对于认识人类行动,特别是当人类与物体发生互动时,在人类周围出现多种不同的物体时。大多数现有的行动识别模型侧重于学习场景的总体视觉线索,而忽视信息丰富的细微差别特征,这些特征可以通过学习人体物体关系和相互作用来捕捉。在本文中,我们通过利用当地和全球环境的相互作用来学习人与物体的关系。因此,我们提议建立全球-地方互动蒸馏网络(GLIDN),通过知识蒸馏来学习人与物体之间的空间和时间互动,通过对精细的场景理解来了解人类与物体之间的长期关系。GLIDN将人与物体编码成图表节点,并通过图形关注网络学习地方和全球关系。本地背景图表通过在框架一级学习人与物体之间的关系,方法是利用人类与物体之间在特定时间步骤上的相互作用来学习。全球关系图表是根据视频和物体相互作用的水平来构建的,在视频序列中查明其长期关系。更重要的是,我们研究这些图表中的知识如何将人类和物体纳入图形节点节点的节点,并通过图形观察网络的对等数据进行我们对等的实验,包括进行我们对等数据分析,我们的数据对等关系,我们通过进行更深入的实验对等数据评估,我们如何的实验,我们如何改进了我们的实验,改进了我们的实验。