Data labeling in supervised learning is considered an expensive and infeasible tool in some conditions. The self-supervised learning method is proposed to tackle the learning effectiveness with fewer labeled data, however, there is a lack of confidence in the size of labeled data needed to achieve adequate results. This study aims to draw a baseline on the proportion of the labeled data that models can appreciate to yield competent accuracy when compared to training with additional labels. The study implements the kaggle.com' cats-vs-dogs dataset, Mnist and Fashion-Mnist to investigate the self-supervised learning task by implementing random rotations augmentation on the original datasets. To reveal the true effectiveness of the pretext process in self-supervised learning, the original dataset is divided into smaller batches, and learning is repeated on each batch with and without the pretext pre-training. Results show that the pretext process in the self-supervised learning improves the accuracy around 15% in the downstream classification task when compared to the plain supervised learning.
翻译:在某些情况下,监督学习的数据标签被认为是一种昂贵和不可行的工具。提议采用自我监督的学习方法,用较少的标签数据解决学习效果问题,然而,对于达到适当结果所需的标签数据的规模缺乏信心。本研究的目的是根据标签数据的比例制定基线,模型可以欣赏与额外标签培训相比产生胜任的准确性。研究采用kaggle.com' cats-vs-dogs数据集、Mnist 和 Fashon-Mnist 来调查自我监督的学习任务,在原始数据集中随机执行随机旋转扩增。为了在自我监督学习中揭示借口过程的真实有效性,原始数据集分为小批次,在每一批中反复学习,同时不以培训为借口。结果显示,自我监督学习的借口过程在与普通监督学习相比,提高了下游分类任务中大约15%的准确性。