Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships. Codes are available at https://github.com/Alvin-Zeng/PGCN.
翻译:多数最先进的行动本地化系统单独处理每项行动提案,而没有在学习期间明确利用它们的关系。然而,提案之间的关系实际上在行动本地化中起着重要作用,因为有意义的行动总是由视频中的多项提案组成。在本文中,我们提议利用图表革命网络(GCNs)来利用提案-提案关系。首先,我们建立一个行动提案图表,其中每项提案都作为节点,它们作为两个提案之间的关系作为优势。我们在这里使用两类关系,一类用于捕捉每项提案的背景信息,另一类用于描述不同行动之间的相互关系。然后,我们用GCNs在图表上标出不同提案之间的关系,并学习关于行动分类和本地化的有力表述。实验结果表明,我们的方法大大超越THUMOOS14(49.1%对42.8%)的状态。此外,活动网络的增强实验还验证了示范行动提案关系的功效。代码可在 https://github.com/Alvin-Zeng/CNPG中查阅。