We investigate the problem of exact cluster recovery using oracle queries. Previous results show that clusters in Euclidean spaces that are convex and separated with a margin can be reconstructed exactly using only $O(\log n)$ same-cluster queries, where $n$ is the number of input points. In this work, we study this problem in the more challenging non-convex setting. We introduce a structural characterization of clusters, called $(\beta,\gamma)$-convexity, that can be applied to any finite set of points equipped with a metric (or even a semimetric, as the triangle inequality is not needed). Using $(\beta,\gamma)$-convexity, we can translate natural density properties of clusters (which include, for instance, clusters that are strongly non-convex in $\mathbb{R}^d$) into a graph-theoretic notion of convexity. By exploiting this convexity notion, we design a deterministic algorithm that recovers $(\beta,\gamma)$-convex clusters using $O(k^2 \log n + k^2 (6/\beta\gamma)^{dens(X)})$ same-cluster queries, where $k$ is the number of clusters and $dens(X)$ is the density dimension of the semimetric. We show that an exponential dependence on the density dimension is necessary, and we also show that, if we are allowed to make $O(k^2 + k\log n)$ additional queries to a "cluster separation" oracle, then we can recover clusters that have different and arbitrary scales, even when the scale of each cluster is unknown.
翻译:使用 orcle 查询来调查精确的集成回收问题 。 先前的结果显示, 位于 Euclidean 空间的群集, 共和和与边距分离的群集可以只使用 $O( log n n) 来重建, 美元是输入点数。 在这项工作中, 我们用更具挑战性的非星团设置来研究这一问题。 我们引入一个叫做$( beta,\ gamma) $- convex 的群集结构特征。 我们引入了一个叫做$( baeta,\ gamma) 的群集结构特征, 它可以应用到任何装有度( 甚至是半度, 因为三角的不平等是不需要的 ) 。 使用 $( betata,\ gamma) 美元- gamma) 的群集的自然密度属性( 例如, 包括强烈的非星系非星系的非星系的非星系 ${R\ d$ ( $) 。 我们设计了一种确定性算算方法, exminal dalalalal exal exal exal $ ( $) $ kx) ax) a.