Active learning is a paradigm of machine learning which aims at reducing the amount of labeled data needed to train a classifier. Its overall principle is to sequentially select the most informative data points, which amounts to determining the uncertainty of regions of the input space. The main challenge lies in building a procedure that is computationally efficient and that offers appealing theoretical properties; most of the current methods satisfy only one or the other. In this paper, we use the classification with rejection in a novel way to estimate the uncertain regions. We provide an active learning algorithm and prove its theoretical benefits under classical assumptions. In addition to the theoretical results, numerical experiments have been carried out on synthetic and non-synthetic datasets. These experiments provide empirical evidence that the use of rejection arguments in our active learning algorithm is beneficial and allows good performance in various statistical situations.
翻译:积极学习是机器学习的范例,目的是减少训练分类员所需的标签数据数量。其总体原则是按顺序选择信息最丰富的数据点,这相当于确定输入空间各区域的不确定性。主要挑战在于建立一个计算效率高、具有吸引力的理论属性的程序;目前大多数方法只满足一种或另一种方法。在本文中,我们以新的方式使用拒绝分类来估计不确定区域。我们提供了一种积极的学习算法,并在古典假设下证明了其理论效益。除了理论结果外,还在合成和非合成数据集上进行了数字实验。这些实验提供了经验证据,证明我们积极学习的算法使用拒绝参数是有益的,并允许在各种统计情况下取得良好的业绩。