We consider the problem of learning the hierarchical cluster structure of graphs in the seeded model, where besides the input graph the algorithm is provided with a small number of `seeds', i.e. correctly clustered data points. In particular, we ask whether one can approximate the Dasgupta cost of a graph, a popular measure of hierarchical clusterability, in sublinear time and using a small number of seeds. Our main result is an $O(\sqrt{\log k})$ approximation to Dasgupta cost of $G$ in $\approx \text{poly}(k)\cdot n^{1/2+O(\epsilon)}$ time using $\approx \text{poly}(k)\cdot n^{O(\epsilon)}$ seeds, effectively giving a sublinear time simulation of the algorithm of Charikar and Chatziafratis[SODA'17] on clusterable graphs. To the best of our knowledge, ours is the first result on approximating the hierarchical clustering properties of such graphs in sublinear time.
翻译:我们考虑了在种子模型中学习图表的等级分组结构的问题,在种子模型中,除了输入图外,算法还提供少量的“种子”,即正确的分组数据点。特别是,我们问,我们是否可以将一个图表的Dasgupta成本(一种对分线时间、分线时间和使用少量种子的流行性分层性衡量标准)接近Dasgupta成本$O(sqrt k}) 近似于Dasgupta $G$($\approx\ text{poly}(k)\cdot n ⁇ 1/2+O(epsilon)}$(k)\cdrox\ text{poly}(k)\cdot n ⁇ O(epsilon)} 种子。我们的主要结果是在可分组图表中对Charikar和Chatziafratis[SO'17]的算法进行亚线性时间模拟。据我们所知,我们的第一个结果就是在亚线上对此类图表的等级属性进行约测算。