We present a novel approach to the core set/instance selection problem in machine learning. Our approach is based on recent results on (proportional) representation in approval-based multi-winner elections. In our model, instances play a double role as voters and candidates. The approval set of each instance in the training set (acting as a voter) is defined from the concept of local set, which already exists in the literature. We then select the election winners by using a representative voting rule, and such winners are the data instances kept in the reduced training set. We evaluate our approach in two experiments involving neural network classifiers and classic machine learning classifiers (KNN and SVM). Our experiments show that, in several cases, our approach improves the performance of state-of-the-art methods, and the differences are statistically significant.
翻译:本文提出了一种解决机器学习中核心集/实例选择问题的新方法。该方法基于近期关于批准型多胜者选举中(比例)代表性的研究成果。在我们的模型中,数据实例同时扮演选民和候选人的双重角色。训练集中每个实例(作为选民)的批准集基于文献中已有的局部集概念定义。随后,我们通过代表性投票规则选出选举胜者,这些胜者即为保留在缩减训练集中的数据实例。我们在涉及神经网络分类器及经典机器学习分类器(KNN与SVM)的两组实验中评估了该方法。实验结果表明,在多种情况下,我们的方法优于现有先进方法,且性能差异具有统计显著性。