Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential. Frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective, similar to $k$-means++. In fact, we argue that $k$-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$-means++ to AA++. In an extensive empirical evaluation of 13 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ almost consistently outperforms all baselines, including the most frequently used ones.
翻译:拱形分析是一种具有共性制约的矩阵要素化方法。 由于本地迷你, 良好的初始化是必要的。 经常使用的初始化方法可以产生亚最佳起点, 或者容易被困在落后的本地迷你中。 在本文中, 我们提出了考古分析++ (A+++), 这是基于对目标影响, 类似于 $k$- 平均值++的序列式样本分析的概率化初始化战略。 事实上, 我们争论说, $k$++ 已经接近了拟议的初始化方法。 此外, 我们建议将一个高效的 Monte Carlo 近似值为 $k$- points++ 到 AA++++ 。 在对13个不同大小和不同维度的实时数据集进行广泛的经验评估中, 并且考虑到两个预处理战略, 我们显示 AA+++ 几乎一贯地超越所有基线, 包括最常用的基线 。