Deep learning models such as Convolutional Neural Networks (CNNs) have demonstrated high levels of effectiveness in a variety of domains, including computer vision and more recently, computational biology. However, training effective models often requires assembling and/or labeling large datasets, which may be prohibitively time-consuming or costly. Pool-based active learning techniques have the potential to mitigate these issues, leveraging models trained on limited data to selectively query unlabeled data points from a pool in an attempt to expedite the learning process. Here we present "Dropout-based Expected IMprOvementS" (DEIMOS), a flexible and computationally-efficient approach to active learning that queries points that are expected to maximize the model's improvement across a representative sample of points. The proposed framework enables us to maintain a prediction covariance matrix capturing model uncertainty, and to dynamically update this matrix in order to generate diverse batches of points in the batch-mode setting. Our active learning results demonstrate that DEIMOS outperforms several existing baselines across multiple regression and classification tasks taken from computer vision and genomics.
翻译:革命神经网络(CNNs)等深层学习模型在包括计算机视觉和最近进行的计算生物学在内的各个领域显示出很高的效益,然而,培训有效模型往往需要集合和(或)贴上大数据集标签,这可能耗时太长或费用太高。基于集合的积极学习技术具有缓解这些问题的潜力,利用在有限数据方面受过培训的模型,从集合库中选择性地查询无标签的数据点,以加快学习进程。我们在这里提出了“基于裁员的预期IMPOmentS”(DEIMOS),这是一种灵活和计算效率高的主动学习方法,可以让查询者指出,期望在具有代表性的抽样中最大限度地改进模型。拟议框架使我们能够保持一个预测共变矩阵,捕捉模型的不确定性,并动态地更新这一矩阵,以便在批量模型设置中产生不同的分数点。我们的积极学习结果表明,DEIMS在计算机视觉和基因组学的多重回归和分类任务中超越了现有的几个基线。