Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of stochastic gradient variational inference. By explicitly integrating domain knowledge in the form of probabilistic relations, our proposed model (DC-GMM) uncovers the underlying distribution of data conditioned on prior clustering preferences, expressed as pairwise constraints. These constraints guide the clustering process towards a desirable partition of the data by indicating which samples should or should not belong to the same cluster. We provide extensive experiments to demonstrate that DC-GMM shows superior clustering performances and robustness compared to state-of-the-art deep constrained clustering methods on a wide range of data sets. We further demonstrate the usefulness of our approach on two challenging real-world applications.
翻译:在机器学习领域,受限制的集群得到高度重视,因为它能够利用关于越来越多的仅部分贴标签的数据的先前信息。随着最近深层基因模型的进展,我们提出了一个新的限制集群框架,该框架是直观的、可解释的,可以在随机梯度变异推断的框架内得到有效培训。通过以概率关系的形式明确整合域知识,我们提议的模型(DC-GMM)揭示了以先前集群偏好为条件的数据的基本分布情况,以双向制约表示。这些制约因素通过指出哪些样品应该或不应该属于同一集群,指导集群进程实现数据的适当分配。我们提供了广泛的实验,以证明DC-GMM显示与最先进的高端集群方法相比,在一系列广泛的数据集中显示优异的集群性能和稳健性。我们进一步展示了我们在两种挑战性现实世界应用方面的做法的有用性。