Spherical data is distributed on the sphere. The data appears in various fields such as meteorology, biology, and natural language processing. However, a method for analysis of spherical data does not develop enough yet. One of the important issues is an estimation of the number of clusters in spherical data. To address the issue, I propose a new method called the Spherical X-means (SX-means) that can estimate the number of clusters on d-dimensional sphere. The SX-means is the model-based method assuming that the data is generated from a mixture of von Mises-Fisher distributions. The present paper explains the proposed method and shows its performance of estimation of the number of clusters.
翻译:球体数据在球体上分布。数据出现在气象学、生物学和自然语言处理等各个领域。然而,分析球体数据的方法尚未充分发展。一个重要问题是对球体数据组数的估计。为了解决这个问题,我提议了一种名为球形X手段(SX手段)的新方法,该方法可以估计二维球体组数。SX手段是一种基于模型的方法,假设数据来自冯·米塞斯-费舍尔分布的混合物。本文件解释了拟议方法,并展示了其估计组数的性能。