Metric clustering is fundamental in areas ranging from Combinatorial Optimization and Data Mining, to Machine Learning and Operations Research. However, in a variety of situations we may have additional requirements or knowledge, distinct from the underlying metric, regarding which pairs of points should be clustered together. To capture and analyze such scenarios, we introduce a novel family of \emph{stochastic pairwise constraints}, which we incorporate into several essential clustering objectives (radius/median/means). Moreover, we demonstrate that these constraints can succinctly model an intriguing collection of applications, including among others \emph{Individual Fairness} in clustering and \emph{Must-link} constraints in semi-supervised learning. Our main result consists of a general framework that yields approximation algorithms with provable guarantees for important clustering objectives, while at the same time producing solutions that respect the stochastic pairwise constraints. Furthermore, for certain objectives we devise improved results in the case of Must-link constraints, which are also the best possible from a theoretical perspective. Finally, we present experimental evidence that validates the effectiveness of our algorithms.
翻译:在从组合优化和数据挖掘到机器学习和操作研究等一系列领域,计量集群至关重要。然而,在各种情况下,我们可能有额外的要求或知识,不同于基本指标,关于哪些点应该组合在一起。为了捕捉和分析这些假设,我们引入了一个新型的组合,我们将其纳入若干基本组合目标(辐射/中间/手段)中。此外,我们证明这些制约因素可以简明地模拟在集群和半监督学习中的制约方面令人感兴趣的应用集,包括 emph{内部公平} 和\emph{most-link}。我们的主要结果包括一个总框架,产生具有重要组合目标可调控保证的近似算法,同时产生尊重对称制约的解决办法。此外,对于某些目标,我们从理论角度讲也是最好的,我们提出实验性证据,用以验证我们算法的有效性。