Given all pairwise weights (distances) among a set of objects, filtered graphs provide a sparse representation by only keeping an important subset of weights. Such graphs can be passed to graph clustering algorithms to generate hierarchical clusters. In particular, the directed bubble hierarchical tree (DBHT) algorithm on filtered graphs has been shown to produce good hierarchical clusters for time series data. We propose a new parallel algorithm for constructing triangulated maximally filtered graphs (TMFG), which produces valid inputs for DBHT, and a scalable parallel algorithm for generating DBHTs that is optimized for TMFG inputs. In addition to parallelizing the original TMFG construction, which has limited parallelism, we also design a new algorithm that inserts multiple vertices on each round to enable more parallelism. We show that the graphs generated by our new algorithm have similar quality compared to the original TMFGs, while being much faster to generate. Our new parallel algorithms for TMFGs and DBHTs are 136--2483x faster than state-of-the-art implementations, while achieving up to 41.56x self-relative speedup on 48 cores with hyper-threading, and achieve better clustering results compared to the standard average-linkage and complete-linkage hierarchical clustering algorithms. We show that on a stock data set, our algorithms produce clusters that align well with human experts' classification.
翻译:根据一组对象中所有对称加权数( 距离), 过滤式图表提供稀疏的表示方式, 只能保留一个重要的加权子组。 此类图表可以传递到图形组算法中, 以生成等级组。 特别是, 过滤式图形中导出的泡泡级树( DBHT) 算法( DBHT) 算法( DBHT) 已经显示为时间序列数据生成了良好的等级组。 我们提出了一个新的平行算法, 用于构建三角最大过滤式图表( TMFG), 以及用于生成 DBHT 投入优化的可缩放平行算法。 除了将原始的 TMFG 构建平行到图形组算法, 以生成等级组算法来生成等级组。 我们的新平行算法( TMFG) 和 DBHT 的分类算法比TMFG 投入优化了136-2483x 。 此外, 我们还设计了一个新的算法, 在每回合中插入多个顶级组算法, 将实现自我更新的、 和高端级组的排序的 和高级组 的算法。</s>