Despite recent success, most contrastive self-supervised learning methods are domain-specific, relying heavily on data augmentation techniques that require knowledge about a particular domain, such as image cropping and rotation. To overcome such limitation, we propose a novel domain-agnostic approach to contrastive learning, named DACL, that is applicable to domains where invariances, and thus, data augmentation techniques, are not readily available. Key to our approach is the use of Mixup noise to create similar and dissimilar examples by mixing data samples differently either at the input or hidden-state levels. To demonstrate the effectiveness of DACL, we conduct experiments across various domains such as tabular data, images, and graphs. Our results show that DACL not only outperforms other domain-agnostic noising methods, such as Gaussian-noise, but also combines well with domain-specific methods, such as SimCLR, to improve self-supervised visual representation learning. Finally, we theoretically analyze our method and show advantages over the Gaussian-noise based contrastive learning approach.
翻译:尽管最近取得了成功,但最有差异的自我监督学习方法都是针对特定领域的,在很大程度上依赖数据增强技术,这些技术需要特定领域的知识,例如图像裁剪和旋转。为了克服这种限制,我们建议对对比性学习采用新型域名不可知的方法,即DACL,它适用于无差异的领域,因此,数据增强技术也不容易获得。我们的方法的关键是使用混合噪声,在输入或隐藏状态层面以不同方式混合数据样本,从而产生相似和不同的例子。为了证明DACL的有效性,我们在不同领域进行实验,例如表格数据、图像和图表。我们的结果显示,DACL不仅优于其他域名的节点方法,例如高斯鼻子,而且还结合了特定领域的方法,例如SimCLR,以改进自我超强的视觉代表学习。最后,我们从理论上分析了我们的方法,并展示了对高斯语基的对比性学习方法的优势。