Despite the fast progress in training specialized models for various tasks, learning a single general model that works well for many tasks is still challenging for computer vision. Here we introduce multi-task self-training (MuST), which harnesses the knowledge in independent specialized teacher models (e.g., ImageNet model on classification) to train a single general student model. Our approach has three steps. First, we train specialized teachers independently on labeled datasets. We then use the specialized teachers to label an unlabeled dataset to create a multi-task pseudo labeled dataset. Finally, the dataset, which now contains pseudo labels from teacher models trained on different datasets/tasks, is then used to train a student model with multi-task learning. We evaluate the feature representations of the student model on 6 vision tasks including image recognition (classification, detection, segmentation)and 3D geometry estimation (depth and surface normal estimation). MuST is scalable with unlabeled or partially labeled datasets and outperforms both specialized supervised models and self-supervised models when training on large scale datasets. Lastly, we show MuST can improve upon already strong checkpoints trained with billions of examples. The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.
翻译:尽管在培训各种任务的专门模型方面取得了快速的进展,但学习一个对许多任务都行之有效的单一通用模型仍对计算机愿景具有挑战性。在这里,我们引入了多任务自我培训(MuST),利用在独立专业教师模型(如图像网络分类模型)中的知识来培训单一普通学生模型。我们的方法有三个步骤。首先,我们在标签数据集上独立培训专业教师。然后,我们利用专业教师标签一个未贴标签的数据集,以创建多任务假的标签数据集。最后,数据集现在包含由不同数据集/任务培训的教师模型的假标签,然后用于培训具有多任务学习的学生模型。我们评估学生模型在6个愿景任务上的特征表现,包括图像识别(分类、检测、分解)和3D地理测量估计(深度和表面正常估计)。我们利用专业教师标签将一个未贴标签或部分贴标签的数据集缩放,并超越了在大型数据集培训中经过专门监管的模式和自我超强模型。最后,我们用经过培训的模型展示的自我升级的模型,可以改进常规模型,我们用有希望的模型来学习的模型。我们展示的自我学习的模型。我们展示的模型,可以改进。在普通模型上改进的自我训练的模型。我们展示的模型,可以改进到有希望获得的模型。