Improving existing widely-adopted prediction models is often a more efficient and robust way towards progress than training new models from scratch. Existing models may (a) incorporate complex mechanistic knowledge, (b) leverage proprietary information and, (c) have surmounted barriers to adoption. Compared to model training, model improvement and modification receive little attention. In this paper we propose a general approach to model improvement: we combine gradient boosting with any previously developed model to improve model performance while retaining important existing characteristics. To exemplify, we consider the context of Mendelian models, which estimate the probability of carrying genetic mutations that confer susceptibility to disease by using family pedigrees and health histories of family members. Via simulations we show that integration of gradient boosting with an existing Mendelian model can produce an improved model that outperforms both that model and the model built using gradient boosting alone. We illustrate the approach on genetic testing data from the USC-Stanford Cancer Genetics Hereditary Cancer Panel (HCP) study.
翻译:现有模型可能(a) 纳入复杂的机械知识,(b) 利用专有信息,以及(c) 克服了采用的障碍。与示范培训相比,模型改进和修改很少受到重视。在本文件中,我们提出了一个改进模型的一般方法:我们将梯度推动与先前开发的任何模型结合起来,以改进模型性能,同时保留重要的现有特征。举例来说,我们考虑了门德利模型的背景,该模型估计的是携带基因突变的概率,这种突变通过使用家庭成员的家庭血清和健康史而导致容易发生疾病。我们通过虚拟模拟显示,梯度推动与现有的门德利模型的整合可以产生一种改进模型,该模型和仅使用梯度推的模型都优于这一模型。我们举例说明了USC-斯坦福癌症遗传研究小组(HCP)的遗传测试数据方法。