随机森林变数重要性措施的顺序变异测试 (Sequential Permutation Testing of Random Forest Variable Importance Measures)

Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package $rfvimptest$. The approach can also be easily applied to any kind of prediction model.

翻译：随机森林(RF)不同重要性措施(VIMP)的假设性测试(VIMP)仍然是正在进行的研究的主题。在最近的一些发展动态中,提出了对参数测试的偏重方法,其分布式假设是以经验证据为依据的。在正常条件下的其他正式测试是分析得出的。不过,这些方法在计算上可能是昂贵的,甚至实际上不可行。这个问题还发生在非参数性变异测试中,然而,这种测试是无分布性的,可以一般地适用于任何类型的RF和VIMP。利用这一优势,在此建议使用顺序变异测试和顺序 p价值估算来降低与常规变异测试相关的高计算成本。流行和广泛使用的VIMP是实用和相关的应用实例。模拟研究的结果证实,序列测试的理论性特性是,即类型I的误差概率在名义上得到控制,而与常规变异测试相比,高功率则得到维持。两种方法的数字稳定性在额外的应用模型中被调查了与常规变异性测试相关的高估量。在两次应用模型中,对数值稳定性进行了定量的测试。在一次测试中,在一次测测测算中,在一次测测测测测测测后的任何。