An interesting but not extensively studied question in active learning is that of sample reusability: to what extent can samples selected for one learner be reused by another? This paper explains why sample reusability is of practical interest, why reusability can be a problem, how reusability could be improved by importance-weighted active learning, and which obstacles to universal reusability remain. With theoretical arguments and practical demonstrations, this paper argues that universal reusability is impossible. Because every active learning strategy must undersample some areas of the sample space, learners that depend on the samples in those areas will learn more from a random sample selection. This paper describes several experiments with importance-weighted active learning that show the impact of the reusability problem in practice. The experiments confirmed that universal reusability does not exist, although in some cases -- on some datasets and with some pairs of classifiers -- there is sample reusability. Finally, this paper explores the conditions that could guarantee the reusability between two classifiers.
翻译:积极学习中一个有趣的、但未广泛研究的问题是:抽样可恢复性:为某个学习者选择的样本在多大程度上可以被另一个学习者再利用?本文解释了为什么样本可再利用性具有实际意义,为什么可再利用性会成为一个问题,为什么可再利用性会是一个问题,如何通过重要加权积极学习来改进可再利用性,以及普遍可再利用性还存在哪些障碍。根据理论论点和实际示范,本文认为,普遍可重复性是不可能的。因为每个积极的学习战略都必须对抽样空间的某些领域进行下取样,依赖这些区域样本的学习者将从随机抽样选择中获得更多的学习。本文描述了几个具有重要加权积极学习的实验,这些实验显示了可再利用性问题的实际影响。实验证实,虽然在某些情况下,在某些数据集和与一些分类师的组合上,普遍可再利用性并不存在,但也存在可重复性。最后,本文探讨了能够保证两个分类者之间可再使用性的条件。