可解释集束化近近近近近近比对准值 (Almost Tight Approximation Algorithms for Explainable Clustering)

from arxiv, 27 pages. Added references to independent work, as well as a table of results, pseudocode, and improved introduction. Note: first version was uploaded on July 1, 2021

Recently, due to an increasing interest for transparency in artificial intelligence, several methods of explainable machine learning have been developed with the simultaneous goal of accuracy and interpretability by humans. In this paper, we study a recent framework of explainable clustering first suggested by Dasgupta et al.~\cite{dasgupta2020explainable}. Specifically, we focus on the $k$-means and $k$-medians problems and provide nearly tight upper and lower bounds. First, we provide an $O(\log k \log \log k)$-approximation algorithm for explainable $k$-medians, improving on the best known algorithm of $O(k)$~\cite{dasgupta2020explainable} and nearly matching the known $\Omega(\log k)$ lower bound~\cite{dasgupta2020explainable}. In addition, in low-dimensional spaces $d \ll \log k$, we show that our algorithm also provides an $O(d \log^2 d)$-approximate solution for explainable $k$-medians. This improves over the best known bound of $O(d \log k)$ for low dimensions~\cite{laber2021explainable}, and is a constant for constant dimensional spaces. To complement this, we show a nearly matching $\Omega(d)$ lower bound. Next, we study the $k$-means problem in this context and provide an $O(k \log k)$-approximation algorithm for explainable $k$-means, improving over the $O(k^2)$ bound of Dasgupta et al. and the $O(d k \log k)$ bound of \cite{laber2021explainable}. To complement this we provide an almost tight $\Omega(k)$ lower bound, improving over the $\Omega(\log k)$ lower bound of Dasgupta et al. Given an approximate solution to the classic $k$-means and $k$-medians, our algorithm for $k$-medians runs in time $O(kd \log^2 k )$ and our algorithm for $k$-means runs in time $ O(k^2 d)$.

翻译：最近,由于对人工智能透明度的兴趣日益浓厚( kk), 我们开发了几种可以解释的机器学习方法, 其同时的目标是为人类提供 $( log klog k) 的精确度和可理解性。在本文中, 我们研究了一个最新的可解释的组合框架, 首先由 Dasgupta 和 al. cite{ dasgup2020 解释} 提出。具体地说, 我们专注于 $( kk) 和 $( kk) 的中间值, 首先, 我们为可解释的 $( log kk) 提供了美元( log k k k) 的同步值算法。我们的算法也为 $( kk) $( log2 Ok) 的经常值解算法做了改进。美元( dg) 和美元( d) 美元( d) 时间( d) 问题解说, 美元( k) 和美元( k) 美元( k) 美元( k) 最低解说, 我们的解说, 美元( 美元( t) 美元) 美元( t- 美元( t) 美元) 美元( 美元) 和美元) 美元( t- 美元) 美元) 的解说, 美元( 的算) 美元( t- d) 的算) 和美元( 美元) 美元( 美元) 的解说, 美元( 美元( 美元( 美元) 和美元) 美元)