This paper deals with a clustering algorithm for histogram data based on a Self-Organizing Map (SOM) learning. It combines a dimension reduction by SOM and the clustering of the data in a reduced space. Related to the kind of data, a suitable dissimilarity measure between distributions is introduced: the $L_2$ Wasserstein distance. Moreover, the number of clusters is not fixed in advance but it is automatically found according to a local data density estimation in the original space. Applications on synthetic and real data sets corroborate the proposed strategy.
翻译:本文件涉及基于自组织地图(SOM)学习的直方图数据群集算法,其中结合了SOM的维度减少和数据在缩小空间中的组合。关于数据类型,采用了一种适当的分布差异计量方法:瓦塞斯坦距离为$L_2美元。此外,群集数量没有事先固定,但根据原始空间的当地数据密度估计自动找到。合成数据集和真实数据集的应用证实了拟议战略。