In this paper we propose an active metric learning method for clustering with pairwise constraints. The proposed method actively queries the label of informative instance pairs, while estimating underlying metrics by incorporating unlabeled instance pairs, which leads to a more accurate and efficient clustering process. In particular, we augment the queried constraints by generating more pairwise labels to provide additional information in learning a metric to enhance clustering performance. Furthermore, we increase the robustness of metric learning by updating the learned metric sequentially and penalizing the irrelevant features adaptively. In addition, we propose a novel active query strategy that evaluates the information gain of instance pairs more accurately by incorporating the neighborhood structure, which improves clustering efficiency without extra labeling cost. In theory, we provide a tighter error bound of the proposed metric learning method utilizing augmented queries compared with methods using existing constraints only. Furthermore, we also investigate the improvement using the active query strategy instead of random selection. Numerical studies on simulation settings and real datasets indicate that the proposed method is especially advantageous when the signal-to-noise ratio between significant features and irrelevant features is low.
翻译:在本文中,我们建议采用一种积极的衡量学习方法,在使用对口限制进行分组时,我们建议采用一种积极的衡量方法。拟议方法积极询问信息实例配对的标签,同时通过纳入无标签实例配对来估计基本衡量标准,从而导致一个更准确和高效的分组过程。特别是,我们通过生成更多的对口标签来提供补充信息,以学习一种衡量方法来提高组合性能。此外,我们通过按顺序更新所学的衡量标准,并适应性地惩罚不相干的特点,来提高衡量标准学习的可靠性。此外,我们提出了一个新的积极查询战略,通过纳入邻里结构来更准确地评估实例配对的信息收益,这样可以提高组合效率而无需额外标签成本。理论上,我们提供了一种更严格的错误,即拟议采用比仅使用现有限制的方法来增加查询的方法。此外,我们还利用主动查询战略而不是随机选择来调查改进了计量方法。关于模拟设置和真实数据集的量化研究表明,在重要特征和不相干的特点之间的信号到噪音比率低时,拟议方法特别有利。