通过以类似为基础的知识蒸馏,进行辅助学习,通过类似性为基础的知识蒸馏,实现自我监督视频代表制 (Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation)

Despite the outstanding success of self-supervised pretraining methods for video representation learning, they generalise poorly when the unlabeled dataset for pretraining is small or the domain difference between unlabelled data in source task (pretraining) and labeled data in target task (finetuning) is significant. To mitigate these issues, we propose a novel approach to complement self-supervised pretraining via an auxiliary pretraining phase, based on knowledge similarity distillation, auxSKD, for better generalisation with a significantly smaller amount of video data, e.g. Kinetics-100 rather than Kinetics-400. Our method deploys a teacher network that iteratively distils its knowledge to the student model by capturing the similarity information between segments of unlabelled video data. The student model then solves a pretext task by exploiting this prior knowledge. We also introduce a novel pretext task, Video Segment Pace Prediction or VSPP, which requires our model to predict the playback speed of a randomly selected segment of the input video to provide more reliable self-supervised representations. Our experimental results show superior results to the state of the art on both UCF101 and HMDB51 datasets when pretraining on K100. Additionally, we show that our auxiliary pertaining, auxSKD, when added as an extra pretraining phase to recent state of the art self-supervised methods (e.g. VideoPace and RSPNet), improves their results on UCF101 and HMDB51. Our code will be released soon.

翻译：尽管自我监督的视频代表学习预培训方法取得了杰出的成功,但是当用于预培训的未贴标签的数据集很小,或者源任务(预培训)中未贴标签的数据与目标任务(调整)中标签数据之间的域差差异很大时,它们一般化得不好。为了缓解这些问题,我们提出了一个新颖的办法,通过一个辅助培训前阶段来补充自我监督的预培训,其基础是知识相似性蒸馏、auxSKD等知识,以便更好地概括化视频数据,例如动因-100而不是动因-400。我们的方法部署一个教师网络,通过捕捉未贴标签的视频数据各部分之间的相似性信息,反复淡化其对学生模式的知识。为了减轻这些问题,我们提出了一种新颖的办法,即根据类似性知识蒸馏、auxSK或VSPP,这需要我们的模型来预测我们输入视频的随机选择部分的回放速度,以提供更可靠的自我监督的演示。我们的实验结果显示它对于学生模式的了解,当我们进行HSPFA前阶段的高级结果时, 当我们进行自我培训时,我们作为ARC前的自我培训时,我们作为AFA级的自我培训的高级方法,我们的一个高级数据显示一个高级的HFFD的预。