The rotation prediction (Rotation) is a simple pretext-task for self-supervised learning (SSL), where models learn useful representations for target vision tasks by solving pretext-tasks. Although Rotation captures information of object shapes, it hardly captures information of textures. To tackle this problem, we introduce a novel pretext-task called image enhanced rotation prediction (IE-Rot) for SSL. IE-Rot simultaneously solves Rotation and another pretext-task based on image enhancement (e.g., sharpening and solarizing) while maintaining simplicity. Through the simultaneous prediction of rotation and image enhancement, models learn representations to capture the information of not only object shapes but also textures. Our experimental results show that IE-Rot models outperform Rotation on various standard benchmarks including ImageNet classification, PASCAL-VOC detection, and COCO detection/segmentation.
翻译:轮调预测(轮调)是自我监督学习的简单托辞任务(SSL),模型通过解决托辞任务,为目标愿景任务学习有用的表述方式。虽然轮调捕捉了物体形状的信息,但几乎无法捕捉纹理的信息。为了解决这一问题,我们为SSL引入了一个称为图像增强旋转预测(IE-Rot)的新颖托辞任务。 IE-Rot同时解决了旋转和另一个基于图像增强(例如,变亮和日光化)的托辞任务,同时保持简单性。通过同时预测旋转和图像增强,模型学习展示不仅捕捉物体形状的信息,而且捕捉纹理的信息。我们的实验结果显示, IE-Rot模型超越了各种标准基准,包括图像网络分类、PASAL-VOC检测和CO检测/分类。