Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
翻译:受限制的集群是一项半监督的任务,它使用有限的标记数据,作为限制,将特定领域的知识纳入其中,并大大提高集群的准确性。以前的工作考虑过精确的优化配方,既能保证最佳集群,又能满足所有制约因素,然而,这些方法缺乏解释性。最近,决策树被用于产生固有的可解释的集群解决方案,但现有办法并不支持集群制约,也没有为解决方案的质量提供强有力的理论保障。在这项工作中,我们提出了一个新的可解释的基于SAT的框架,用于可解释的集群,支持集群制约,并为解决方案的质量提供强有力的理论保障。我们还对这些用户提供的制约的可解释性和满意度进行了新的权衡。我们的框架是可解释和受限制的集群的第一种办法。与一系列现实世界和合成数据集的实验表明,我们的方法可以产生高质量和可解释的受限制的集群解决方案。