Clustering data into meaningful subsets is a major task in scientific data analysis. To date, various strategies ranging from model-based approaches to data-driven schemes, have been devised for efficient and accurate clustering. One important class of clustering methods that is of a particular interest is the class of exemplar-based approaches. This interest primarily stems from the amount of compressed information encoded in these exemplars that effectively reflect the major characteristics of the respective clusters. Affinity propagation (AP) has proven to be a powerful exemplar-based approach that refines the set of optimal exemplars by iterative pairwise message updates. However, a critical limitation is its inability to capitalize on known networked relations between data points often available for various scientific datasets. To mitigate this shortcoming, we propose geometric-AP, a novel clustering algorithm that effectively extends AP to take advantage of the network topology. Geometric-AP obeys network constraints and uses max-sum belief propagation to leverage the available network topology for generating smooth clusters over the network. Extensive performance assessment reveals a significant enhancement in the quality of the clustering results when compared to benchmark clustering schemes. Especially, we demonstrate that geometric-AP performs extremely well even in cases where the original AP fails drastically.
翻译:将数据分组为有意义的子集是科学数据分析的一项主要任务。迄今为止,已经为高效率和准确的分组设计了从基于模型的方法到数据驱动办法的各种战略,从基于数据驱动办法的模型到基于数据驱动办法,为高效和准确的分组设计了各种战略。一个特别感兴趣的重要组群方法类别是基于实例的方法的类别。这一兴趣主要来自这些Exemplors所编码的压缩信息数量,它们有效地反映了各个组群的主要特点。近似传播(AP)已被证明是一种强有力的基于实例的方法,它通过迭代对口信息更新来完善一套最佳Exemers。然而,一个关键的局限性是它无法利用各种科学数据集经常可用的数据点之间已知的网络关系。为减轻这一缺陷,我们提出了几何测定-AP,这是一种新型的组合算法,它有效地扩展了亚太组群群,以利用网络的地形图学。大地测量应用了网络限制,并利用最大和最高值的推力来利用现有的网络表,在网络上生成平稳的群集。广泛的绩效评估表明,即使与最初的亚太组群集计划相比,甚至质量也大大改进了。我们所测量的案例。