This paper presents a model-agnostic ensemble approach for supervised learning. The proposed approach alternates between (1) learning an ensemble of models using a parametric version of the Random Subspace approach, in which feature subsets are sampled according to Bernoulli distributions, and (2) identifying the parameters of the Bernoulli distributions that minimize the generalization error of the ensemble model. Parameter optimization is rendered tractable by using an importance sampling approach able to estimate the expected model output for any given parameter set, without the need to learn new models. While the degree of randomization is controlled by a hyper-parameter in standard Random Subspace, it has the advantage to be automatically tuned in our parametric version. Furthermore, model-agnostic feature importance scores can be easily derived from the trained ensemble model. We show the good performance of the proposed approach, both in terms of prediction and feature ranking, on simulated and real-world datasets. We also show that our approach can be successfully used for the reconstruction of gene regulatory networks.
翻译:本文介绍了一种用于监督学习的模型-不可知共通性方法。拟议办法的替代方法是:(1) 使用随机子空间方法的参数性版本学习一组模型,其中根据Bernoulli分布情况对特性子集进行抽样,(2) 确定Bernoulli分布的参数,以尽量减少共通模型的概括错误。通过使用能够估计任何特定参数集的预期模型输出的重要抽样方法,使参数优化具有可移动性,而不必学习新模型。虽然随机化的程度由标准的超参数控制,但在我们的参数性子空间中具有自动调整的优势。此外,从经过培训的共通模型中可以很容易地得出模型-不可知特征重要分数。我们从预测和地位排序的角度,在模拟和真实世界数据集中,都显示了拟议办法的良好表现。我们还表明,我们的方法可以成功地用于基因管理网络的重建。