Traditional self-supervised learning requires CNNs using external pretext tasks (i.e., image- or video-based tasks) to encode high-level semantic visual representations. In this paper, we show that feature transformations within CNNs can also be regarded as supervisory signals to construct the self-supervised task, called \emph{internal pretext task}. And such a task can be applied for the enhancement of supervised learning. Specifically, we first transform the internal feature maps by discarding different channels, and then define an additional internal pretext task to identify the discarded channels. CNNs are trained to predict the joint labels generated by the combination of self-supervised labels and original labels. By doing so, we let CNNs know which channels are missing while classifying in the hope to mine richer feature information. Extensive experiments show that our approach is effective on various models and datasets. And it's worth noting that we only incur negligible computational overhead. Furthermore, our approach can also be compatible with other methods to get better results.
翻译:传统的自我监督学习要求CNN使用外部托辞任务(例如图像或视频任务)对高层次的语义直观表达进行编码。 在本文中,我们显示CNN内部的特征转换也可以被视为构建自监督任务的监督信号,称为 emph{内部托辞任务。 这样的任务可以用于加强监督学习。 具体地说, 我们首先通过丢弃不同频道来转换内部地物图, 然后定义额外的内部托辞任务来识别废弃的频道。 CNN接受了如何预测由自我监督的标签和原始标签组合产生的联合标签的培训。 通过这样做, 我们让CNN知道哪些频道丢失了, 同时希望挖掘更丰富的特征信息。 广泛的实验表明我们的方法在各种模型和数据集中有效。 值得指出的是, 我们只能承担微不足道的计算间接费用。 此外, 我们的方法也可以与其他方法相匹配, 以获得更好的结果。