Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward way to map $n$-dimensional data in input space to a lower $m$-dimensional representation space and back. The decoder itself defines an $m$-dimensional manifold in input space. Inspired by manifold learning, we show that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derive expressions for the number of samples needed to specify the encoder and decoder and show that the decoder generally requires much less training samples to be well-specified compared to the encoder. We discuss training of autoencoders in this perspective and relate to previous work in the field that use noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrate that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further show that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.
翻译:自动编码器通常用于代号学习。 自动编码器通常用于代号学习, 包括一个编码器和一个概念解码器, 这为绘制输入空间中以美元为单位的元数据提供了直截了当的方法, 绘制输入空间中以美元为单位的元数据, 以较低美元为单位的代号空间和反向。 解码器本身定义了一个以美元为单位的元元元元元数据。 在多重学习的启发下, 我们显示, 解码器可以通过学习培训样本和使用梯度下降的解码器重量进行自己的培训。 平方代表器损失相当于优化元数据, 以最小的欧cliidean距离到培训样本, 类似于其他损失功能。 我们为指定解码器指定编码器和解码器所需的样本数量, 并表明, 一般来说, 与编码相比, 解码器的样本要求要少得多的培训。 我们讨论如何从这个角度对自动编码器进行训练, 并且与以前使用更响的培训方法和其他类型的正规化工作有关。 关于自然图像数据集- MNI和CFAR10, 我们展示的解解码代表方法将更适合用来学习普通数据。