In this article, we study curvature-like feature value of data sets in Euclidean spaces. First, we formulate such curvature functions with desirable properties under the manifold hypothesis. Then we make a test property for the validity of the curvature function by the law of large numbers, and check it for the function we construct by numerical experiments. These experiments also suggest the conjecture that the mean of the curvature of sample manifolds coincides with the curvature of the mean manifold. Our construction is based on the dimension estimation by the principal component analysis and the Gaussian curvature of hypersurfaces. Our function depends on provisional parameters $\varepsilon, \delta$, and we suggest dealing with the resulting functions as a function of these parameters to get some robustness. As an application, we propose a method to decompose data sets into some parts reflecting local structure. For this, we embed the data sets into higher dimensional Euclidean space using curvature values and cluster them in the embedding space. We also give some computational experiments that support the effectiveness of our methods.
翻译:在文章中, 我们研究在欧几里德空间的数据集的曲线特征值。 首先, 我们根据多重假设, 设计出这样的曲线功能。 然后我们根据大数字法则对曲线函数的有效性进行测试属性, 并检查它是否具有我们通过数字实验构建的功能。 这些实验还表明, 样本数的曲线值与平均值的曲线值相吻合的推测。 我们的构造基于主要组成部分分析的尺寸估计和高斯表面的曲线。 我们的功能取决于临时参数$\varepsilon,\delta$, 我们建议用这些参数的函数处理由此产生的功能, 以获得某种稳健性。 作为应用, 我们提出了一个方法, 将数据集分解成反映本地结构的某些部分。 在这方面, 我们用曲线值将数据集嵌入更高维度的 Euclidea 空间, 并在嵌入空间中。 我们还提供一些支持我们方法有效性的计算实验。