The price of explainability for a clustering task can be defined as the unavoidable loss,in terms of the objective function, if we force the final partition to be explainable. Here, we study this price for the following clustering problems: $k$-means, $k$-medians, $k$-centers and maximum-spacing. We provide upper and lower bounds for a natural model where explainability is achieved via decision trees. For the $k$-means and $k$-medians problems our upper bounds improve those obtained by [Moshkovitz et. al, ICML 20] for low dimensions. Another contribution is a simple and efficient algorithm for building explainable clusterings for the $k$-means problem. We provide empirical evidence that its performance is better than the current state of the art for decision-tree based explainable clustering.
翻译:集群任务的解释性价格可以定义为不可避免的损失,从客观功能的角度来看,如果我们强制解释最终分区的话。在这里,我们研究以下组群问题的价格:美元-平均值、美元-中间值、美元-中间值、美元-中间值和最大间距。我们为通过决策树实现解释性的自然模型提供了上下界限。对于美元-平均值和美元-中间值,我们的上界限问题在于如何改善[Moshkovitz等人,ICML 20]获得的低维度数据。另一个贡献是为美元-中间值问题建立可解释的组群的简单而有效的算法。我们提供了经验证据,证明其性能优于基于决策树解释的集群的目前状态。