Prediction of time-to-event data often suffers from rare event rates, small sample sizes, high dimensionality and low signal-to-noise ratios. Incorporating published prediction models from large-scale studies is expected to improve the performance of prognosis prediction on internal individual-level time-to-event data. However, existing integration approaches typically assume that underlying distributions from the external and internal data sources are similar, which is often invalid. To account for challenges including heterogeneity, data sharing, and privacy constraints, we propose a discrete failure time modeling procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the published models and the internal dataset. Simulations show the advantage of the proposed method compared with those solely based on the internal data or published models. We apply the proposed method to improve prediction performance on a kidney transplant dataset from a local hospital by integrating this small-scale dataset with published survival models obtained from the national transplant registry.
翻译:对时间到活动数据的预测往往受到罕见事件率、样本大小小、高度维度和信号到噪音比率低等因素的影响。将大规模研究中公布的预测模型纳入其中,预期会改进对内部个人时间到活动数据的预测性预测的性能。然而,现有的综合方法通常假定外部和内部数据源的基本分布相似,这往往无效。为了应对包括异质性、数据共享和隐私限制等挑战,我们提议采用一个离散的故障时间模型程序,利用基于不同危害的Kullback-Leibell的歧视性信息衡量已公布的模型与内部数据集之间的差异。模拟显示拟议方法与仅以内部数据或已公布的模型为基础的方法相比的优势。我们采用拟议方法,通过将这一小规模数据集与国家移植登记册上公布的生存模型整合,改进当地医院肾移植数据集的预测性。