Learning self-supervised video representation predominantly focuses on discriminating instances generated from simple data augmentation schemes. However, the learned representation often fails to generalize over unseen camera viewpoints. To this end, we propose ViewCLR, that learns self-supervised video representation invariant to camera viewpoint changes. We introduce a view-generator that can be considered as a learnable augmentation for any self-supervised pre-text tasks, to generate latent viewpoint representation of a video. ViewCLR maximizes the similarities between the latent viewpoint representation with its representation from the original viewpoint, enabling the learned video encoder to generalize over unseen camera viewpoints. Experiments on cross-view benchmark datasets including NTU RGB+D dataset show that ViewCLR stands as a state-of-the-art viewpoint invariant self-supervised method.
翻译:学习自监督的视频演示主要侧重于简单数据增强计划产生的歧视实例。 但是, 学习自监督的视频演示往往无法对不可见的相机视图进行概括化。 为此, 我们提议 ViewClLR, 学习自监督的视频演示与相机视图的变异性。 我们引入了可被视为对任何自监督的预文本任务而言可学习的增强功能的视图生成器, 以生成视频的潜在视角演示。 ViewCLR 将潜在观点演示之间的相似性最大化, 从原始视角看, 使学习过的视频编码器能够对不可见的相机视图进行概括化。 包括 NTU RGB+D 数据集在内的交叉视图基准数据集实验显示, ViewCLR 是一种最先进的差异性自我监督方法。