Semi-supervised action recognition is a challenging but important task due to the high cost of data annotation. A common approach to this problem is to assign unlabeled data with pseudo-labels, which are then used as additional supervision in training. Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself. In this work, we propose a more effective pseudo-labeling scheme, called Cross-Model Pseudo-Labeling (CMPL). Concretely, we introduce a lightweight auxiliary network in addition to the primary backbone, and ask them to predict pseudo-labels for each other. We observe that, due to their different structural biases, these two models tend to learn complementary representations from the same video clips. Each model can thus benefit from its counterpart by utilizing cross-model predictions as supervision. Experiments on different data partition protocols demonstrate the significant improvement of our framework over existing alternatives. For example, CMPL achieves $17.6\%$ and $25.1\%$ Top-1 accuracy on Kinetics-400 and UCF-101 using only the RGB modality and $1\%$ labeled data, outperforming our baseline model, FixMatch, by $9.0\%$ and $10.3\%$, respectively.
翻译:由于数据注释成本高,半监督行动识别是一项具有挑战性但重要的任务。这个问题的一个共同办法是用伪标签来分配未贴标签的数据,然后作为额外的培训监督。通常在最近的工作中,假标签是通过在标签数据上培训一个模型获得的,然后使用模型的自信预测来进行自我教育。在这项工作中,我们提议了一个更有效的假标签计划,称为跨模多普塞多比(CMPL)。具体地说,我们除了主要骨干之外,还引入了一个轻量级辅助网络,要求它们预测对方的伪标签。我们发现,由于这两个模型的结构偏差不同,它们往往从相同的视频剪片中学习补充性说明。因此,每个模型都可以从对应模型的跨模数预测中得益,作为监督。对不同数据分区协议的实验表明,我们的框架比现有的替代品大有改进。例如,CMPL在基尼基特斯-400美元和顶级-1美元的准确度上25.1美元。我们注意到,这两个模型由于不同的结构偏差,往往从相同的视频剪贴片中学习补充。因此,每个模型都可以利用跨模型作为监督。对它的好处。对不同的模型进行实验,只用RGB的10美元和10美元的基调数据。