The low-dimensional manifold hypothesis posits that the data found in many applications, such as those involving natural images, lie (approximately) on low-dimensional manifolds embedded in a high-dimensional Euclidean space. In this setting, a typical neural network defines a function that takes a finite number of vectors in the embedding space as input. However, one often needs to consider evaluating the optimized network at points outside the training distribution. This paper considers the case in which the training data is distributed in a linear subspace of $\mathbb R^d$. We derive estimates on the variation of the learning function, defined by a neural network, in the direction transversal to the subspace. We study the potential regularization effects associated with the network's depth and noise in the codimension of the data manifold. We also present additional side effects in training due to the presence of noise.
翻译:低维多元假设假设认为,许多应用中发现的数据,例如涉及自然图像的数据,是(约)嵌入高维Euclidean空间的低维元体。在这一背景下,典型的神经网络界定了一个功能,在嵌入空间中将一定数量的矢量矢量作为输入。然而,人们往往需要考虑在培训分布范围以外的地点对优化网络进行评价。本文考虑了培训数据在直线子空间($\mathbbb R ⁇ d$)中分布的情况。我们估算了由神经网络界定的学习功能在向子空间方向的变异性。我们研究了与网络深度和数据元组合的噪音有关的潜在正规化效应。我们还提出了由于噪音的存在而在培训中产生的额外副效应。