Deep learning has made revolutionary advances to diverse applications in the presence of large-scale labeled datasets. However, it is prohibitively time-costly and labor-expensive to collect sufficient labeled data in most realistic scenarios. To mitigate the requirement for labeled data, semi-supervised learning (SSL) focuses on simultaneously exploring both labeled and unlabeled data, while transfer learning (TL) popularizes a favorable practice of fine-tuning a pre-trained model to the target data. A dilemma is thus encountered: Without a decent pre-trained model to provide an implicit regularization, SSL through self-training from scratch will be easily misled by inaccurate pseudo-labels, especially in large-sized label space; Without exploring the intrinsic structure of unlabeled data, TL through fine-tuning from limited labeled data is at risk of under-transfer caused by model shift. To escape from this dilemma, we present Self-Tuning, a novel approach to enable data-efficient deep learning by unifying the exploration of labeled and unlabeled data and the transfer of a pre-trained model. Further, to address the challenge of confirmation bias in self-training, a Pseudo Group Contrast (PGC) mechanism is devised to mitigate the reliance on pseudo-labels and boost the tolerance to false-labels. Self-Tuning outperforms its SSL and TL counterparts on five tasks by sharp margins, e.g. it doubles the accuracy of fine-tuning on Cars with 15% labels.
翻译:深层次的学习在大型标签数据集存在的情况下,给各种应用带来了革命性的进展。然而,在最现实的假设情景下,收集足够的标签数据需要的时间成本和劳动力成本都令人望而却步。为了减轻对标签数据的要求,半监督的学习(SSL)侧重于同时探索标签和未标签数据,而转移学习(TL)则推广一种对目标数据进行预先培训的模型进行微调的有利做法。因此,我们遇到了一个难题:如果没有一个体面的事先培训的模型来提供隐含的正规化,那么通过从零开始的自我培训,将很容易被不准确的假标签错误地错误地错误地错误地错误地错误地收集;如果不探索无标签数据的内在结构,半监督的学习(SSL)则侧重于同时同时探索标签和未标签数据,同时侧重于同时同时同时同时同时同时探索标签和未标签数据,转移是一种有利的做法。为了通过统一标签和未标签数据的探索以及预先培训模型的转移来使数据具有双重效率的深层次学习。此外,通过精确的底线,它要用不准确的假的假的假的标签比值来消除自己在自我训练中的自我定位上的自我定位上的自我定位上的自我定位,从而减少自我定位的自我定位,从而减轻其自我定位的自我定位的自我定位的自我定位的自我定位的自我定位,从而减轻对等的自我调节的自我调节的自我调节的自我调节的自我调节的自我调节的自我调节的自我调节的自我调节的自我调节的自我调节的挑战。