Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examples w.r.t. the query for contrastive learning in a cascade of stages. Specifically, CPR exploits multiple views of a query example in different modalities, where an alternative view may help find another positive example dissimilar in the query view. We explore the effects of possible CPR configurations in ablations including the number of mining stages, the top similar example selection ratio in each stage, and progressive training with an incremental number of the final Top-k selection. The overall mining quality is measured to reflect the recall across training set classes. CPR reaches a median class mining recall of 83.3%, outperforming previous work by 5.5%. Implementation-wise, CPR is complementary to pretext tasks and can be easily applied to previous work. In the evaluation of pretraining on UCF101, CPR consistently improves existing work and even achieves state-of-the-art R@1 of 56.7% and 24.4% in video retrieval as well as 83.8% and 54.8% in action recognition on UCF101 and HMDB51. The code is available at https://github.com/necla-ml/CPR.
翻译:自我监督的视频代表学习被展示为有效改进下游任务,如视频检索和行动识别等。在本文中,我们展示了Cascade正回收率(CPR),连续逐个阶段地挖掘积极的范例,以逐阶段进行对比学习。具体地说,CPR在不同模式中利用了对查询示例的多重观点,在不同的模式中,另一种观点可能有助于找到另一个不同的积极范例。我们探索了可能的CPR配置在跳槽中的影响,包括采矿阶段的数目、每个阶段最相似的样例选择比率,以及渐进式培训,最后的顶级选择数量是递增的。总体采矿质量测量反映了各培训班的回选率。CPR达到83.3%的中位级采矿回顾率,比以前的工作成绩高出5.5%。从执行角度讲,CPR可以补充托拉斯任务,并很容易适用于以往的工作。在对UCF101的预培训中,CPR不断改进现有工作,甚至实现了56.7%和24.4%的RMB状态,在视频检索中实现了84 %和HRMB的确认,在HA/83/8,在HCRR/8和HCR/8的确认。