The recent proposed self-supervised learning (SSL) approaches successfully demonstrate the great potential of supplementing learning algorithms with additional unlabeled data. However, it is still unclear whether the existing SSL algorithms can fully utilize the information of both labelled and unlabeled data. This paper gives an affirmative answer for the reconstruction-based SSL algorithm \citep{lee2020predicting} under several statistical models. While existing literature only focuses on establishing the upper bound of the convergence rate, we provide a rigorous minimax analysis, and successfully justify the rate-optimality of the reconstruction-based SSL algorithm under different data generation models. Furthermore, we incorporate the reconstruction-based SSL into the existing adversarial training algorithms and show that learning from unlabeled data helps improve the robustness.
翻译:最近提出的自我监督学习(SSL)方法成功地展示了学习算法以附加未加标签的数据补充学习算法的巨大潜力。 但是,目前尚不清楚现有的SSL算法能否充分利用贴标签和未加标签数据的信息。 本文对若干统计模型下基于重建的SSL算法 \ citep{leew2020predition} 给出了肯定的答案。 虽然现有文献仅侧重于确定趋同率的上限,但我们提供了严格的微缩分析,并成功地证明基于重建的SSL算法在不同的数据生成模型下具有最佳率。 此外,我们将基于重建的SSL纳入现有的对抗性培训算法,并表明从未加标签的数据中学习有助于提高稳健性。