A basic problem in machine learning is to find a mapping $f$ from a low dimensional latent space $\mathcal{Y}$ to a high dimensional observation space $\mathcal{X}$. Modern tools such as deep neural networks are capable to represent general non-linear mappings. A learner can easily find a mapping which perfectly fits all the observations. However, such a mapping is often not considered as good, because it is not simple enough and can overfit. How to define simplicity? We try to make a formal definition on the amount of information imposed by a non-linear mapping $f$. Intuitively, we measure the local discrepancy between the pullback geometry and the intrinsic geometry of the latent space. Our definition is based on information geometry and is independent of the empirical observations, nor specific parameterizations. We prove its basic properties and discuss relationships with related machine learning methods.
翻译:机器学习的一个基本问题是从一个低维潜层空间找到一个从一个低维潜层空间 $\ mathcal{Y} $\ mathcal{X}美元到一个高维观测空间 $\ mathcal{X}$美元。 深神经网络等现代工具能够代表一般的非线性绘图。 学习者可以很容易地找到一个完全符合所有观测结果的绘图。 但是, 这样的绘图往往被认为不怎么合适, 因为它不够简单, 并且可以过度使用。 如何定义简单性? 我们试图对非线性绘图所强加的信息数量做出正式定义 $f$。 我们直观地测量了隐性空间的拉回几何和内在几何方法之间的局部差异。 我们的定义以信息几何测量为基础, 独立于实验性观测, 或具体的参数。 我们证明它的基本特性, 并讨论它与相关机器学习方法的关系 。