We provide a new bi-criteria $\tilde{O}(\log^2 k)$ competitive algorithm for explainable $k$-means clustering. Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable $k$-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering $\tilde{O}(k)$ competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into $(1+\delta)k$ clusters (where $\delta\in (0,1)$ is a parameter of the algorithm). The cost of this clustering is at most $\tilde{O}(1/ \delta \cdot \log^2 k)$ times the cost of the optimal unconstrained $k$-means clustering. We show that this bound is almost optimal.
翻译:我们为可解释的美元汇率分组提供了一个新的双标准 $tilde{O}(\log2 k) 。 由 Dasgupta、 Frost、 Moshkovitz 和 Rashtchian (ICML 2020) 最近引入了可解释的美元汇率。 它描述为易于解释和理解( 阈值) 决策树或图表。 可解释的美元汇率分组的成本等于其组群成本的总和; 每个组组的成本等于从组群各点到该组群中心平方距离的总和。 可解释的美元汇率最近由 Dasgup、 Frost、 Moshkovitz 和 Rashtchian (ICM 2020) 引入了最佳的非双标准汇率。 我们随机化的双标准算法构建了一个将数据组分解成 $( 1 delta) k 的门槛树( $ delta\ in 0, 1) 是一个算法的参数 。 这种组合的成本几乎是 $\ tilde $% (1/\ delde{O} (k) 具有竞争力的最好的非标准 。 美元 。