Attempt to fully discover the temporal diversity and chronological characteristics for self-supervised video representation learning, this work takes advantage of the temporal dependencies within videos and further proposes a novel self-supervised method named Temporal Contrastive Graph Learning (TCGL). In contrast to the existing methods that ignore modeling elaborate temporal dependencies, our TCGL roots in a hybrid graph contrastive learning strategy to jointly regard the inter-snippet and intra-snippet temporal dependencies as self-supervision signals for temporal representation learning. To model multi-scale temporal dependencies, our TCGL integrates the prior knowledge about the frame and snippet orders into graph structures, i.e., the intra-/inter- snippet temporal contrastive graphs. By randomly removing edges and masking nodes of the intra-snippet graphs or inter-snippet graphs, our TCGL can generate different correlated graph views. Then, specific contrastive learning modules are designed to maximize the agreement between nodes in different views. To adaptively learn the global context representation and recalibrate the channel-wise features, we introduce an adaptive video snippet order prediction module, which leverages the relational knowledge among video snippets to predict the actual snippet orders. Experimental results demonstrate the superiority of our TCGL over the state-of-the-art methods on large-scale action recognition and video retrieval benchmarks.
翻译:这项工作试图充分发现自我监督的视频代表学习的时间多样性和时间顺序特点,利用视频中的时间依赖性,进一步提出一种全新的自我监督方法,名为“时间对立图表学习(TCGL) ” 。 与现有方法忽视模型复杂的时间依赖性(TTCGL)不同,我们的TCGL根根植于一种混合图形对比学习战略,共同将片段间和片段内时间依赖性视为时间代表学习的自我监督信号。对于模拟多尺度时间依赖性,我们的TCGL将先前关于框架和片断订单的知识整合到图形结构中,即间/间时间对时间对比图表学习。通过随机去除片段内图表或间布局间图表的边缘和遮掩节点,我们的TCGL可以产生不同的相关图表观点。然后,设计具体的对比性学习模块,以最大限度地在不同观点中各节点之间的协议。为了适应全球背景和片段间关于框架和片断订单的先前知识,即内部/间对比性时间对比性对比性图表对比性图表图。 通过随机性顺序学习全球背景,引入了大比例的图像比级预测结果,从而展示了我们对频道-机平级的图像-机序的图像-直判比的图像-直判平比的图像-直判平比。