In this article, we study curvature-like feature value of data sets in Euclidean spaces. First we formulate such curvature functions with desirable properties under the manifold hypothesis. Then we make a test property for the validity of the curvature function by the law of large numbers, and check it for the function we construct by numerical experiments. These experiments also suggest us to conjecture that mean of the curvature of sample manifolds coincides with the curvature of the mean manifold. Our construction is based on the dimension estimation by the principal component analysis and the Gaussian curvature of hypersurfaces. Our function depends on provisional parameters $\varepsilon, \delta$, and we suggest to deal with the resulting functions as a function of these parameters to get some robustness. As an application, we propose a method to decompose data sets into some parts reflecting local structure. For this, we embed the data sets into higher dimensional Euclidean space by using curvature values and cluster them in the embedded space. We also give some computational experiments that support effectiveness of our methods.
翻译:在文章中, 我们研究欧几里德空间中数据组的曲线特征值。 首先, 我们根据多重假设, 设计出具有理想属性的曲线函数 。 然后, 我们根据大数法则对曲线函数的有效性进行测试属性, 并检查它是否具有由数字实验构建的功能 。 这些实验还表明我们可以推断样本数的曲线值与平均数的曲线值相吻合。 我们的构造基于主要组成部分分析的尺寸估计和高斯表层的曲线。 我们的功能取决于临时参数 $\ varepsilon,\delta$, 我们建议用这些参数的函数处理由此产生的功能, 以获得某种稳健性。 作为应用, 我们提出了一个方法, 将数据集解析成反映本地结构的某些部分。 为此, 我们使用曲线值将数据集嵌入高维欧球度空间, 并在嵌入的空间中将其分组。 我们还提供一些支持我们方法有效性的计算实验 。