Difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in outcomes estimation. Leveraging set-valued classification and split conformal inference, we show how a fixed algorithm developed in one cohort of data may be deployed in another, while rigorously accounting for uncertainty from the initial classification step. We demonstrate this process using SEER cancer registry data linked with Medicare claims data.
翻译:在确定保健索赔数据中的癌症阶段方面困难重重,限制了护理和保健结果研究的肿瘤质量。我们用索赔数据将肺癌阶段分为三类(一/二、三和四阶段)的预测算法进行了调整,然后演示了将分类不确定性纳入结果估计的方法。利用定值分类法和分解一致推理,我们展示了如何在另一组数据中采用固定算法,同时严格计算最初分类步骤的不确定性。我们利用SEAR癌症登记处数据,以及与Medicare索赔数据相联系的数据,来证明这一过程。