Exemplar learning of visual similarities in an unsupervised manner is a problem of paramount importance to Computer Vision. In this context, however, the recent breakthrough in deep learning could not yet unfold its full potential. With only a single positive sample, a great imbalance between one positive and many negatives, and unreliable relationships between most samples, training of Convolutional Neural networks is impaired. In this paper we use weak estimates of local similarities and propose a single optimization problem to extract batches of samples with mutually consistent relations. Conflicting relations are distributed over different batches and similar samples are grouped into compact groups. Learning visual similarities is then framed as a sequence of categorization tasks. The CNN then consolidates transitivity relations within and between groups and learns a single representation for all samples without the need for labels. The proposed unsupervised approach has shown competitive performance on detailed posture analysis and object classification.
翻译:以不受监督的方式广泛学习视觉相似之处对于计算机视野来说是一个极为重要的问题。 但是,在这方面,最近在深层学习方面的突破还不能充分发挥其潜力。 只有一个积极的样本,一个正数和许多负数之间的巨大不平衡,以及大多数样本之间不可靠的关系,对进化神经网络的培训受到损害。 在本文中,我们使用对当地相似之处的微弱估计,提出单一的优化问题来提取具有相互一致关系的样本。 冲突关系分布在不同批次上,类似的样本被分组为紧凑组。 学习视觉相似性随后作为分类任务的一个序列来构建。 CNN随后巩固了群体内部和群体之间的中转性关系,并学习了所有样本的单一代表性,而不需要标签。 拟议的未经监督的方法在详细的姿态分析和对象分类上显示了竞争性的表现。