Autoencoders, which consist of an encoder and a decoder, are widely used in machine learning for dimension reduction of high-dimensional data. The encoder embeds the input data manifold into a lower-dimensional latent space, while the decoder represents the inverse map, providing a parametrization of the data manifold by the manifold in latent space. A good regularity and structure of the embedded manifold may substantially simplify further data processing tasks such as cluster analysis or data interpolation. We propose and analyze a novel regularization for learning the encoder component of an autoencoder: a loss functional that prefers isometric, extrinsically flat embeddings and allows to train the encoder on its own. To perform the training it is assumed that for pairs of nearby points on the input manifold their local Riemannian distance and their local Riemannian average can be evaluated. The loss functional is computed via Monte Carlo integration with different sampling strategies for pairs of points on the input manifold. Our main theorem identifies a geometric loss functional of the embedding map as the $\Gamma$-limit of the sampling-dependent loss functionals. Numerical tests, using image data that encodes different explicitly given data manifolds, show that smooth manifold embeddings into latent space are obtained. Due to the promotion of extrinsic flatness, these embeddings are regular enough such that interpolation between not too distant points on the manifold is well approximated by linear interpolation in latent space as one possible postprocessing.
翻译:由编码器和解码器组成的自动解码器被广泛用于机器学习,以降低高维数据的维度。编码器将输入数据元嵌入低维潜层空间,而解码器则代表反向地图,提供由隐形空间的元体组成的数据元数的配对。嵌入元件的正常性和结构可以大大简化进一步的数据处理任务,如集解分析或数据内插。我们提议和分析一种新颖的正规化,以学习自动编码器的编码器组件:一种偏好于测量的损耗功能,一种偏向于偏向于平偏向直流的直流后潜层嵌入空间空间空间空间空间空间空间空间空间数据。为了进行这一培训,对于输入的相近点的配对,可以评估其本地里曼尼特距离和本地里曼平均值。通过蒙特卡洛的整合和输入元数的对等点的不同取样策略来计算损失功能。我们的主要理论确定了将映射图的几何损失功能,作为固定的平坦度的直径直径直线功能功能,而不是极直径直径直径直的根根根的直径直径直径直线值内基内基内嵌基内嵌内嵌点点点点,而该等的正的内径基内存数据测试的内径基内存数据测试。