In this paper, we propose a generic framework for active clustering with queries for pairwise similarities between objects. First, the pairwise similarities can be any positive or negative number, yielding full flexibility in the type of feedback that a user/annotator can provide. Second, the process of querying pairwise similarities is separated from the clustering algorithm, leading to more flexibility in how the query strategies can be constructed. Third, the queries are robust to noise by allowing multiple queries for the same pairwise similarity (i.e., a non-persistent noise model is assumed). Finally, the number of clusters is automatically identified based on the currently known pairwise similarities. In addition, we propose and analyze a number of novel query strategies suited to this active clustering framework. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.
翻译:在本文中,我们提出了积极分组的通用框架,并询问对象之间的对等相似性。首先,对等相似性可以是任何正数或负数,在用户/说明者能够提供的反馈类型中产生充分的灵活性。第二,对等相似性的过程与组合算法分离,从而在如何构建查询战略方面产生更大的灵活性。第三,通过允许对同一对等性进行多次查询(即假设有一个非持久性的噪音模型),这些查询对噪音非常有力。最后,根据目前已知的对称相似性自动确定组群的数目。此外,我们提出和分析一些适合这一积极分组框架的新式查询战略。我们通过几项实验研究,展示了我们框架和拟议查询战略的有效性。