We propose a highly data-efficient active learning framework for image classification. Our novel framework combines: (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process (GP) method, in sequence to achieve highly data and label efficient classifications. Moreover, both elements are less sensitive to the prevalent and challenging class imbalance issue, thanks to the (1) feature learned without labels and (2) the Bayesian nature of GP. The GP-provided uncertainty estimates enable active learning by ranking samples based on the uncertainty and selectively labeling samples showing higher uncertainty. We apply this novel combination to the severely imbalanced case of COVID-19 chest X-ray classification and the Nerthus colonoscopy classification. We demonstrate that only . 10% of the labeled data is needed to reach the accuracy from training all available labels. We also applied our model architecture and proposed framework to a broader class of datasets with expected success.
翻译:我们提出了高数据效率的积极学习框架,用于图像分类。我们的新框架包括:(1) 革命神经网络的无监督代表性学习,(2) 高山进程方法,以达到高数据和贴标签效率分类的顺序排列。此外,这两个要素对于普遍和具有挑战性的阶级不平衡问题不那么敏感,因为(1) 在没有标签的情况下学习的特征和(2) GP的巴伊西亚性质。GP提供的不确定性估计有助于根据不确定性和有选择地标定具有较高不确定性的样本进行排名抽样,积极学习。我们将这种新颖的组合用于严重不平衡的COVID-19胸腔X射线分类和Nerthus结肠镜检查分类。我们证明,只有10%的标签数据需要从培训所有现有标签中获得准确性。我们还将我们的模型架构和拟议框架应用于范围更广的数据集,预期会取得成功。