We propose a new method for approximating active learning acquisition strategies that are based on retraining with hypothetically-labeled candidate data points. Although this is usually infeasible with deep networks, we use the neural tangent kernel to approximate the result of retraining, and prove that this approximation works asymptotically even in an active learning setup -- approximating "look-ahead" selection criteria with far less computation required. This also enables us to conduct sequential active learning, i.e. updating the model in a streaming regime, without needing to retrain the model with SGD after adding each new data point. Moreover, our querying strategy, which better understands how the model's predictions will change by adding new data points in comparison to the standard ("myopic") criteria, beats other look-ahead strategies by large margins, and achieves equal or better performance compared to state-of-the-art methods on several benchmark datasets in pool-based active learning.
翻译:我们提出了一种基于以假设标签的候选数据点进行再培训的积极学习获取战略的新方法。 虽然这通常与深层网络不可行,但我们使用神经相近内核来估计再培训的结果,并证明这种近似即使在积极的学习设置中也毫无效果 -- -- 近似“看头”的选择标准,而所需要的计算要少得多。这也使我们能够进行连续积极学习,即在流成制度中更新模型,而无需在添加每一个新的数据点后用 SGD 重新对模型进行再培训。此外,我们的查询策略,我们更好地了解模型的预测将如何通过增加新的数据点来改变,与标准(“眼部”)标准(“)标准)相较,大边比其他视头战略,在以池为基础的积极学习的若干基准数据集上取得与最新方法相同或更好的业绩。