Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examples w.r.t. the query for contrastive learning in a cascade of stages. Specifically, CPR exploits multiple views of a query example in different modalities, where an alternative view may help find another positive example dissimilar in the query view. We explore the effects of possible CPR configurations in ablations including the number of mining stages, the top similar example selection ratio in each stage, and progressive training with an incremental number of the final Top-k selection. The overall mining quality is measured to reflect the recall across training set classes. CPR reaches a median class mining recall of 83.3%, outperforming previous work by 5.5%. Implementation-wise, CPR is complementary to pretext tasks and can be easily applied to previous work. In the evaluation of pretraining on UCF101, CPR consistently improves existing work and even achieves state-of-the-art R@1 of 56.7% and 24.4% in video retrieval as well as 83.8% and 54.8% in action recognition on UCF101 and HMDB51. The code is available at https://github. com/necla-ml/CPR.
翻译:自我监督的视频代表学习被展示为有效改进下游任务,如视频检索和行动识别等。在本文中,我们展示了Cascade正回收率(CPR),连续逐个阶段地挖掘积极的范例,以逐阶段进行对比学习。具体地说,CPR在不同模式中利用了对查询示例的多重观点,在不同的模式中,另一种观点可能有助于找到另一个不同的积极范例。我们探讨了可能的CPR配置在编织中的影响,包括采矿阶段的数目、每个阶段最相似的样例选择比率以及渐进式培训,最后的顶级选择数量是递增的。总体采矿质量测量反映了各培训班的回选率。CPR达到83.3%的中位级采矿回顾率,比以前的工作成绩高出5.5%。从执行角度讲,CPR可以补充一些托辞任务,并很容易适用于以往的工作。在对UCF101-CPR进行预先培训时,CPR始终在改进现有工作,甚至实现了56.7%和24.4%的RMB状态,在54%的视频检索中实现了83 %的确认,在H/MD/83的CR/CLA中的确认和84,在HCR/CR/83。