Many recent methods for unsupervised representation learning train models to be invariant to different "views," or distorted versions of an input. However, designing these views requires considerable trial and error by human experts, hindering widespread adoption of unsupervised representation learning methods across domains and modalities. To address this, we propose viewmaker networks: generative models that learn to produce useful views from a given input. Viewmakers are stochastic bounded adversaries: they produce views by generating and then adding an $\ell_p$-bounded perturbation to the input, and are trained adversarially with respect to the main encoder network. Remarkably, when pretraining on CIFAR-10, our learned views enable comparable transfer accuracy to the well-tuned SimCLR augmentations -- despite not including transformations like cropping or color jitter. Furthermore, our learned views significantly outperform baseline augmentations on speech recordings (+9% points, on average) and wearable sensor data (+17% points). Viewmakers can also be combined with handcrafted views: they improve robustness to common image corruptions and can increase transfer performance in cases where handcrafted views are less explored. These results suggest that viewmakers may provide a path towards more general representation learning algorithms -- reducing the domain expertise and effort needed to pretrain on a much wider set of domains. Code is available at https://github.com/alextamkin/viewmaker.
翻译:在许多不受监督的代表学习培训模式中,许多不受监督的代表学习培训模式都对不同的“视图”或被扭曲的投入版本不感兴趣。然而,设计这些观点需要大量人类专家的尝试和错误,阻碍广泛采用跨领域和模式的不受监督的代表学习方法。为了解决这个问题,我们建议了造影网络:一些基因化模型,这些模型学会从特定投入中产生有用的观点。 视觉制造者是随机的捆绑对手:它们通过生成和随后在输入中添加一个$@ell_p_p$受约束的扭曲性扰动,并在主编码网络方面接受对抗性培训。值得注意的是,在CIFAR-10培训前,我们所学过的观点能够使经过良好调整的 SIMCLR 扩增系统具有可比的传输准确性,尽管其中没有包括诸如裁剪裁或彩色提示等的转换。此外,我们所学发现的观点大大超出关于语音记录(+9%点,平均)和可磨损的传感器数据(+17%分点)的基线。 视觉制作者也可以与手动的观点相结合:它们能改进共同的图像腐败,并且可以提高一般的视野,在域里程里程中可以提供更精度。