心心传播神经积极学习 (Neural Active Learning on Heteroskedastic Distributions)

Models that can actively seek out the best quality training data hold the promise of more accurate, adaptable, and efficient machine learning. State-of-the-art active learning techniques tend to prefer examples that are the most difficult to classify. While this works well on homogeneous datasets, we find that it can lead to catastrophic failures when performed on multiple distributions with different degrees of label noise or heteroskedasticity. These active learning algorithms strongly prefer to draw from the distribution with more noise, even if their examples have no informative structure (such as solid color images with random labels). To this end, we demonstrate the catastrophic failure of these active learning algorithms on heteroskedastic distributions and propose a fine-tuning-based approach to mitigate these failures. Further, we propose a new algorithm that incorporates a model difference scoring function for each data point to filter out the noisy examples and sample clean examples that maximize accuracy, outperforming the existing active learning techniques on the heteroskedastic datasets. We hope these observations and techniques are immediately helpful to practitioners and can help to challenge common assumptions in the design of active learning algorithms.

翻译：能够积极寻找最佳质量培训数据的模型有望带来更准确、更适应和更高效的机器学习。最先进的积极学习技术往往更喜欢最难分类的例子。虽然这在同质数据集方面效果良好,但我们发现,如果以不同程度的标签噪音或三重心性进行多种分布时,它会导致灾难性的失败。这些积极的学习算法强烈倾向于以更多噪音从分布中提取,即使它们的例子没有信息结构(如带有随机标签的固体彩色图像) 。为此,我们展示了这些主动学习算法的灾难性失败,并提出了一种基于微调的方法来减轻这些失败。此外,我们提出了一种新的算法,它包含每个数据点的模型评分功能,以过滤最吵闹的示例和清洁的示例,从而最大限度地提高准确性,超过在四重心数据集上的现有积极学习技术。我们希望这些观察和技术能立即对实践者有所帮助,并有助于挑战设计积极学习算法时的共同假设。

相关内容

主动学习

关注 240

主动学习是机器学习（更普遍的说是人工智能）的一个子领域，在统计学领域也叫查询学习、最优实验设计。“学习模块”和“选择策略”是主动学习算法的2个基本且重要的模块。主动学习是“一种学习方法，在这种方法中，学生会主动或体验性地参与学习过程，并且根据学生的参与程度，有不同程度的主动学习。” （Bonwell＆Eison 1991）Bonwell＆Eison（1991）指出：“学生除了被动地听课以外，还从事其他活动。” 在高等教育研究协会（ASHE）的一份报告中，作者讨论了各种促进主动学习的方法。他们引用了一些文献，这些文献表明学生不仅要做听，还必须做更多的事情才能学习。他们必须阅读，写作，讨论并参与解决问题。此过程涉及三个学习领域，即知识，技能和态度（KSA）。这种学习行为分类法可以被认为是“学习过程的目标”。特别是，学生必须从事诸如分析，综合和评估之类的高级思维任务。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日