Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods. Whereas self-supervised learning frameworks usually take the prevailing perfect instance level invariance hypothesis for granted, we carefully investigate the pitfalls behind. Particularly, we argue that the existing augmentation pipeline for generating multiple positive views naturally introduces out-of-distribution (OOD) samples that undermine the learning of the downstream tasks. Generating diverse positive augmentations on the input does not always pay off in benefiting downstream tasks. To overcome this inherent deficiency, we introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning. UOTA adaptively searches for the most important sampling region to produce views, and provides viable choice for outlier-robust self-supervised learning approaches. Our method directly generalizes to many mainstream self-supervised learning approaches, regardless of the loss's nature contrastive or not. We empirically show UOTA's advantage over the state-of-the-art self-supervised paradigms with evident margin, which well justifies the existence of the OOD sample issue embedded in the existing approaches. Especially, we theoretically prove that the merits of the proposal boil down to guaranteed estimator variance and bias reduction. Code is available: at https://github.com/ssl-codelab/uota.
翻译:我们的工作揭示了现有主流自我监督学习方法的结构性缺陷。 虽然自我监督学习框架通常采用普遍完美实例水平的误差假设,但我们仔细地调查了背后的陷阱。 特别是,我们争论说,现有的产生多种积极观点的扩增管道自然引入了分散(OOOD)样本,这有损于对下游任务的学习。在投入上产生各种不同的积极增强并不一定总能为下游任务带来好处。为了克服这一内在缺陷,我们引入了一个轻巧的潜在可变UOTA模型,针对自我监督学习的视觉抽样问题。 UOTA适应性地搜索最重要的抽样区域以产生观点,并为外部-robust自我监督的学习方法提供了可行的选择。我们的方法直接概括了许多主流自我监督学习方法,而不管损失的性质如何,或者不是如此。我们的经验表明,UOTA优于最先进的自上层/下层模式,这充分证明了OOD抽样问题的存在,在现有的准则/下层偏差做法中得到了肯定。我们从理论上证明OOTA的优势。