In this paper, we extend our PrInDT method (Weihs & Buschfeld 2021a) towards undersampling with different percentages of the smaller and the larger classes (psmall and plarge), stratification of predictors, varying the prediction threshold, and measuring variable importance in ensembles. An application of these methods to a linguistic example suggests the following: 1. In undersampling, a careful selection of the percentages plarge and psmall is important for building models with high balanced accuracies; 2. Stratification of predictors does not majorly enhance balanced accuracies; 3. Lowering the prediction threshold for the smaller class turns out to be an alternative method to undersampling because it increases the likelihood of the smaller class being selected. Finally, we introduce a method for ranking predictor importance that allows for a straightforward interpretation of the results.
翻译:在本文中,我们把普里特特特方法(Weihs & Buschfeld 2021a)推广到对较小和较大类别(小类和大类)不同百分比、预测数的分层、预测阈值不同和在组合中的可变重要性的衡量,低抽样。 将这些方法应用于语言实例表明如下: 1. 在低抽样中,仔细选择大类和小类的百分比对于建立高度平衡的模型很重要; 2. 预测数的分层不能大大加强平衡的准确性; 3. 降低小类的预测阈值是减少低抽样的替代方法,因为它增加了被选中的较小类别的可能性。 最后,我们引入了排序预测重要性的方法,以便能够对结果进行直截了当的解释。