深度批量主动学习回归的框架和基准 (A Framework and Benchmark for Deep Batch Active Learning for Regression)

from arxiv, Changes in v3: Improvements in writing and other minor changes. Accompanying code can be found at https://github.com/dholzmueller/bmdal_reg

The acquisition of labels for supervised learning can be expensive. In order to improve the sample-efficiency of neural network regression, we study active learning methods that adaptively select batches of unlabeled data for labeling. We present a framework for constructing such methods out of (network-dependent) base kernels, kernel transformations and selection methods. Our framework encompasses many existing Bayesian methods based on Gaussian Process approximations of neural networks as well as non-Bayesian methods. Additionally, we propose to replace the commonly used last-layer features with sketched finite-width Neural Tangent Kernels, and to combine them with a novel clustering method. To evaluate different methods, we introduce an open-source benchmark consisting of 15 large tabular regression data sets. Our proposed method outperforms the state-of-the-art on our benchmark, scales to large data sets, and works out-of-the-box without adjusting the network architecture or training code. We provide open-source code that includes efficient implementations of all kernels, kernel transformations, and selection methods, and can be used for reproducing our results.

翻译：有监督学习的标签获取可能很昂贵。为了提高神经网络回归的样本效率，我们研究了适应性选择未标记数据批次进行标记的主动学习方法。我们提出了一个框架，用于利用（与网络相关的）基内核、内核变换和选择方法构建这样的方法。我们的框架包括许多基于高斯过程逼近神经网络的贝叶斯方法以及非贝叶斯方法。此外，我们建议使用用于替换常用的最后一层特征的有限宽度神经切向核，以及与一种新的聚类方法相结合。为了评估不同的方法，我们介绍了一个由15个大型表格回归数据集组成的开放源代码基准。我们提出的方法在我们的基准测试中优于最先进的方法，可扩展到大型数据集，并且在不调整网络架构或培训代码的情况下即可开箱即用。我们提供开源代码，其中包括所有内核、内核变换和选择方法的高效实现，并可用于复制我们的结果。

相关内容

主动学习

关注 240

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【CMU-Amazon】时间序列预测：理论与实践，379页ppt阐述大规模时序预测工具与方法

专知会员服务

234+阅读 · 2020年4月24日