It has long been thought that high-dimensional data encountered in many practical machine learning tasks have low-dimensional structure, i.e., the manifold hypothesis holds. A natural question, thus, is to estimate the intrinsic dimension of a given population distribution from a finite sample. We introduce a new estimator of the intrinsic dimension and provide finite sample, non-asymptotic guarantees. We then apply our techniques to get new sample complexity bounds for Generative Adversarial Networks (GANs) depending only on the intrinsic dimension of the data.
翻译:长期以来人们一直认为,在许多实用机器学习任务中遇到的高维数据具有低维结构,即多重假设存在。因此,一个自然的问题是,从有限的样本中估计特定人口分布的内在层面。我们引入了一个新的内在层面的估算器,并提供有限的样本和非无损保证。然后,我们运用我们的技术为基因反逆网络(GANs)获取新的样本复杂性界限,仅取决于数据的内在层面。