The success of deep neural networks often relies on a large amount of labeled examples, which can be difficult to obtain in many real scenarios. To address this challenge, unsupervised methods are strongly preferred for training neural networks without using any labeled data. In this paper, we present a novel paradigm of unsupervised representation learning by Auto-Encoding Transformation (AET) in contrast to the conventional Auto-Encoding Data (AED) approach. Given a randomly sampled transformation, AET seeks to predict it merely from the encoded features as accurately as possible at the output end. The idea is the following: as long as the unsupervised features successfully encode the essential information about the visual structures of original and transformed images, the transformation can be well predicted. We will show that this AET paradigm allows us to instantiate a large variety of transformations, from parameterized, to non-parameterized and GAN-induced ones. Our experiments show that AET greatly improves over existing unsupervised approaches, setting new state-of-the-art performances being greatly closer to the upper bounds by their fully supervised counterparts on CIFAR-10, ImageNet and Places datasets.
翻译:深心神经网络的成功往往依赖于大量贴有标签的例子,在许多真实的情景下很难获得这些例子。为了应对这一挑战,在不使用任何贴有标签的数据的情况下,在培训神经网络方面,非常倾向于采用不受监督的方法。在本文中,我们展示了与常规的自动编码数据(AED)方法相比,自动编码转换(AET)进行未经监督的代言学习的新模式。在随机抽样的转换中,AET试图仅仅从尽可能精确的编码功能中预测它,在输出端,设想如下:只要未经监督的特性成功地将原始图像和已变图像的视觉结构的基本信息编码起来,这种转换就可以很好地预测。我们将表明,AET模式使我们能够从参数化的参数化到非计量化的和GAN诱导的多种转变。我们的实验表明,AET大大改进了现有的未经监督的方法,将新的状态性表现设置在完全监督的CIFAR-10图像网络上层和图像上层的对应方大大接近上层。