This paper presents a proposal of a faster Wasserstein $k$-means algorithm for histogram data by reducing Wasserstein distance computations and exploiting sparse simplex projection. We shrink data samples, centroids, and the ground cost matrix, which leads to considerable reduction of the computations used to solve optimal transport problems without loss of clustering quality. Furthermore, we dynamically reduced the computational complexity by removing lower-valued data samples and harnessing sparse simplex projection while keeping the degradation of clustering quality lower. We designate this proposed algorithm as sparse simplex projection based Wasserstein $k$-means, or SSPW $k$-means. Numerical evaluations conducted with comparison to results obtained using Wasserstein $k$-means algorithm demonstrate the effectiveness of the proposed SSPW $k$-means for real-world datasets
翻译:本文提出一个提案,通过减少瓦森斯坦距离计算和利用稀薄的简单投影,更快地用瓦森斯坦(Wasserstein)以KK美元为单位计算直方图数据。我们收缩了数据样本、机器人和地面成本矩阵,这导致大量减少用于解决最佳运输问题的计算,而不会丧失集群质量。此外,我们通过去除低价值数据样本,利用稀薄的简单x预测,同时保持组群质量的下降,从而动态地减少了计算的复杂性。我们把这一拟议的算法指定为以瓦森斯坦(Wasserste)美元为单位的稀薄简单投影法,或SSPW美元为单位。与使用瓦森(W)美元平均值计算的结果进行比较的数值评估表明,拟议的SSPW($k美元)手段对现实世界数据集的有效性。