Data acquisition processes for machine learning are often costly. To construct a high-performance prediction model with fewer data, a degree of difficulty in prediction is often deployed as the acquisition function in adding a new data point. The degree of difficulty is referred to as uncertainty in prediction models. We propose an uncertainty estimation method named a Distance-weighted Class Impurity without explicit use of prediction models. We estimated uncertainty using distances and class impurities around the location, and compared it with several methods based on prediction models for uncertainty estimation by active learning tasks. We verified that the Distance-weighted Class Impurity works effectively regardless of prediction models.
翻译:机器学习的数据获取过程往往费用高昂。要建立一个数据较少的高性能预测模型,通常会将预测方面的困难程度作为增加新数据点的获取功能。难度程度被称为预测模型的不确定性。我们建议一种称为远距离加权级不纯度的不确定性估算方法,但不明确使用预测模型。我们使用地点周围的距离和类别杂质来估计不确定性,并将其与基于预测模型的几种方法进行比较,以便通过积极的学习任务来估计不确定性。我们核实了远程加权级不纯度不管预测模型如何,都能有效发挥作用。