$k$-means and $k$-median clustering are powerful unsupervised machine learning techniques. However, due to complicated dependences on all the features, it is challenging to interpret the resulting cluster assignments. Moshkovitz, Dasgupta, Rashtchian, and Frost [ICML 2020] proposed an elegant model of explainable $k$-means and $k$-median clustering. In this model, a decision tree with $k$ leaves provides a straightforward characterization of the data set into clusters. We study two natural algorithmic questions about explainable clustering. (1) For a given clustering, how to find the "best explanation" by using a decision tree with $k$ leaves? (2) For a given set of points, how to find a decision tree with $k$ leaves minimizing the $k$-means/median objective of the resulting explainable clustering? To address the first question, we introduce a new model of explainable clustering. Our model, inspired by the notion of outliers in robust statistics, is the following. We are seeking a small number of points (outliers) whose removal makes the existing clustering well-explainable. For addressing the second question, we initiate the study of the model of Moshkovitz et al. from the perspective of multivariate complexity. Our rigorous algorithmic analysis sheds some light on the influence of parameters like the input size, dimension of the data, the number of outliers, the number of clusters, and the approximation ratio, on the computational complexity of explainable clustering.
翻译:美元汇率和美元汇率中间组合是强大的、不受监督的机器学习技术。然而,由于对所有特点的复杂依赖,解释由此而来的集束任务具有挑战性。Moshkovitz、Dasgupta、Rashtchian和Frost [ICML 2020] 提出了一个优雅的可解释美元汇率和美元汇率的模型。在这个模型中,一棵带有美元汇率叶子的决策树对数据组进行了直截了当的描述。我们研究了两个关于可解释的集束的自然逻辑性问题。 (1) 对于一个特定的集束,如何用美元汇率来找到“最佳解释”的“最佳解释”? (2) 对于一组特定点,如何用美元找到一个决策树,从而将由此而来的集束的美元汇率/中间目标降到最低? 为了解决第一个问题,我们引入了一个新的可解释的集束模式。我们基于可靠统计数据的外数概念的模型是以下的。我们正在寻找少量的点(外部)如何用美元比率来找到“最佳解释 ”, 。我们正在从一个精确的变数的变数的变数分析中开始我们目前数据组合的精度的模型。