Recent work on explainable clustering allows describing clusters when the features are interpretable. However, much modern machine learning focuses on complex data such as images, text, and graphs where deep learning is used but the raw features of data are not interpretable. This paper explores a novel setting for performing clustering on complex data while simultaneously generating explanations using interpretable tags. We propose deep descriptive clustering that performs sub-symbolic representation learning on complex data while generating explanations based on symbolic data. We form good clusters by maximizing the mutual information between empirical distribution on the inputs and the induced clustering labels for clustering objectives. We generate explanations by solving an integer linear programming that generates concise and orthogonal descriptions for each cluster. Finally, we allow the explanation to inform better clustering by proposing a novel pairwise loss with self-generated constraints to maximize the clustering and explanation module's consistency. Experimental results on public data demonstrate that our model outperforms competitive baselines in clustering performance while offering high-quality cluster-level explanations.
翻译:最近关于可解释的集群的工作使得在可以解释特征时可以描述群集。然而,许多现代机器学习侧重于使用深层学习但数据原始特征无法解释的图像、文本和图表等复杂数据。本文探讨了在复杂数据上进行群集的新环境,同时使用可解释的标记作出解释。我们建议进行深层次的描述性分组,在复杂数据上进行亚共性代表式学习,同时根据象征性数据作出解释。我们通过最大限度地利用关于投入的经验分布和为群集目标而诱发的群集标签之间的相互信息来形成良好的群集。我们通过解决一整线性编程,为每个群集提供简明和正方形的描述,从而产生解释性能。最后,我们允许通过提出带有自我产生的限制的新颖的对等损失,使组合和解释模块的一致性最大化。关于公共数据的实验结果表明,我们的模型在群集业绩方面超过了竞争性的基线,同时提供高质量的群集层次的解释。