近似古老分析的概率分析方法 (Probabilistic methods for approximate archetypal analysis)

Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that, provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.

翻译：考古学分析是一种未经监督的探索性数据分析学习方法。限制考古学分析实际适用性的一个主要挑战是现有算法的内在计算复杂性。在本文中,我们为部分解决这一问题提供了一种新的近似方法。我们利用高维几何学的概率性设想,分别采用两种预处理技术来降低数据的尺寸和表示度。我们证明,只要数据大致嵌入一个低维线性子空间,而相应的表层的锥形体被一个有少量脊椎的聚极体所近似,我们的方法就能有效地减少原形分析的规模。此外,在预测误差方面,减少问题的解决方案接近最佳。我们的方法可以与其他加速技术相结合,进一步减轻原形分析的内在复杂性。我们通过应用我们的方法总结几个中等大型数据集,来证明我们结果的有用性。