In this paper we introduce a new model building algorithm called self-validating ensemble modeling or SVEM. The method enables the fitting of validated predictive models to the relatively small data sets typically generated from designed experiments. We focus on prediction which is often the important metric in studies in bio-pharmaceutical industries. In order to fit validated predictive models, SVEM uses a unique weighting scheme applied to the responses and fractional weighted bootstrapping to generate a large ensemble of fitted models. The weighting scheme allows the original data to serve both as a training set and validation set. The method is very general in application and works with most model selection algorithms. Through extensive simulation studies and a case study we demonstrate that SVEM generates models with lower prediction error as compared to more traditional statistical approaches that are based on fitting a single model when the true model has low sparsity and when the number of experimental runs is small.
翻译:在本文中,我们引入了一种新的模型建设算法,称为自我验证混合模型或SVEM。这种方法使得经过验证的预测模型能够与通常由设计实验产生的相对较小的数据集相匹配。我们侧重于预测,这是生物制药工业研究中通常重要的衡量尺度。为了适应经过验证的预测模型,SVEM使用一种独特的加权办法,用于应对和分数加权制导,产生大量适合的模型组合。加权办法使原始数据既用作训练组又用作鉴定组。这种方法在应用中非常普遍,并且与大多数模型选择算法一起工作。通过广泛的模拟研究和案例研究,我们证明SVEM生成的模型与较传统的统计方法相比,预测错误较小,而后者是在真实模型不太宽度和实验运行数量较少的情况下,根据一种模型的适合而采用更传统的统计方法。