Active learning promises to improve annotation efficiency by iteratively selecting the most important data to be annotated first. However, we uncover a striking contradiction to this promise: active learning fails to select data as efficiently as random selection at the first few choices. We identify this as the cold start problem in vision active learning, caused by a biased and outlier initial query. This paper seeks to address the cold start problem by exploiting the three advantages of contrastive learning: (1) no annotation is required; (2) label diversity is ensured by pseudo-labels to mitigate bias; (3) typical data is determined by contrastive features to reduce outliers. Experiments are conducted on CIFAR-10-LT and three medical imaging datasets (i.e. Colon Pathology, Abdominal CT, and Blood Cell Microscope). Our initial query not only significantly outperforms existing active querying strategies but also surpasses random selection by a large margin. We foresee our solution to the cold start problem as a simple yet strong baseline to choose the initial query for vision active learning. Code is available: https://github.com/c-liangyu/CSVAL
翻译:积极学习有望通过迭代选择最重要的数据来提高说明性效率,从而提高说明性效率。然而,我们发现与这一承诺有惊人的矛盾:积极学习未能在最初几个选择中以随机选择的方式有效地选择数据。我们认为这是由偏差和胜于前者的初始查询造成的视力主动学习中的冷点问题。本文试图通过利用对比性学习的三个优势来解决冷点启动问题:(1) 不需要注解;(2) 假标签确保标签多样性,以减少偏差;(3) 典型数据是通过对比性特征来确定的,以减少离子。在CIFAR-10-LT和三个医学成像数据集(如:科伦病理学、腹部CT和血液细胞显微镜)上进行了实验。我们的初步查询不仅大大超越了现有的主动查询战略,而且大大超越了随机选择。我们预见到我们冷点启动问题的解决方案是简单而有力的基准,可以选择视觉积极学习的初步查询。有代码: https://github.com/c-liangyum/CSVAL/CVAL/CVAL。