The availability of large labeled datasets is the key component for the success of deep learning. However, annotating labels on large datasets is generally time-consuming and expensive. Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. Diversity-based sampling algorithms are known as integral components of representation-based approaches for active learning. In this paper, we introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Self-supervised representation learning is used to consider the diversity of samples in the initial dataset selection algorithm. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings. By considering the consistency information with the diversity in the consistency-based embedding scheme, the proposed method could select more informative samples for labeling in the semi-supervised learning setting. Comparative experiments show that the proposed method achieves compelling results on CIFAR-10 and Caltech-101 datasets compared with previous active learning approaches by utilizing the diversity of unlabeled data.
翻译:提供大型标签数据集是深层学习取得成功的关键组成部分。然而,在大型数据集上贴注标签通常耗时费钱。积极学习是一个研究领域,通过选择最重要的标签样本来解决昂贵标签问题。基于多样性的抽样算法被称为基于代表性的积极学习方法的组成部分。在本文中,我们采用了新的基于多样性的初步数据集选择算法,以选择在积极学习环境中最初贴标签时最丰富的样本。自我监督的代表学习被用于考虑初始数据集选择算法中的样本多样性。此外,我们还提出了一个新的积极学习查询战略,在基于一致性的嵌入方法上采用基于多样性的抽样。通过考虑与基于一致性的嵌入方法的多样性的一致性信息的一致性,拟议方法可以选择更多资料的样本,用于在半超强学习环境中贴标签。比较实验表明,拟议方法在CIFAR-10和Caltech-101数据集上取得了令人信服的结果,而与以往的积极学习方法相比,利用未贴标签数据的多样性。