To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either very task-unrelated or very task-specific training signals from unlabeled data. We argue that these two approaches, at the two extreme ends of the task-specificity spectrum, are suboptimal for the task performance. Utilizing too little task-specific training signals causes underfitting to the ground-truth labels of downstream tasks, while the opposite causes overfitting to the ground-truth labels. To this end, we propose a novel Class-agnostic Semi-supervised Pretraining (CaSP) framework to achieve a more favorable task-specificity balance in extracting training signals from unlabeled data. Compared to semi-supervised learning, CaSP reduces the task specificity in training signals by ignoring class information in the pseudo labels and having a separate pretraining stage that uses only task-unrelated unlabeled data. On the other hand, CaSP preserves the right amount of task specificity by leveraging box/mask-level pseudo labels. As a result, our pretrained model can better avoid underfitting/overfitting to ground-truth labels when finetuned on the downstream task. Using 3.6M unlabeled data, we achieve a remarkable performance gain of 4.7% over ImageNet-pretrained baseline on object detection. Our pretrained model also demonstrates excellent transferability to other detection and segmentation tasks/frameworks.
翻译:为改善试级检测/分类性能,现有的自我监督和半监督方法,从未贴标签的数据中提取非常与任务无关或非常特定任务的培训信号。 我们争辩说,这两种方法,在任务特定频谱的两个极端端,对于任务业绩来说是不最理想的。 利用太少的任务特定培训信号导致与下游任务的地面真相标签不相符,而相反的原因是与地面真相标签不符。 为此,我们提议了一个新型的类级半监督预设培训(CaSP)框架,以便在从未贴标签的数据中提取培训信号时实现更有利的任务特定性平衡。与半监督学习相比,CaSP降低了培训信号中的任务特殊性,因为它忽略了伪标签中的课堂信息,并且有一个单独的培训前阶段,只使用与任务有关的模型不贴标签的数据。另一方面,CaSP通过利用框/图像级的准受监督(CaSP)预监测目标来保持正确的任务具体性能,在使用框框/成品级的伪标签时,我们使用更精确的实地标签。