Observational data in medicine arise as a result of the complex interaction between patients and the healthcare system. The sampling process is often highly irregular and itself constitutes an informative process. When using such data to develop prediction models, this phenomenon is often ignored, leading to sub-optimal performance and generalisability of models when practices evolve. We propose a multi-task recurrent neural network which models three clinical presence dimensions -- namely the longitudinal, the inter-observation and the missingness processes -- in parallel to the survival outcome. On a prediction task using MIMIC III laboratory tests, explicit modelling of these three processes showed improved performance in comparison to state-of-the-art predictive models (C-index at 1 day horizon: 0.878). More importantly, the proposed approach was more robust to change in the clinical presence setting, demonstrated by performance comparison between patients admitted on weekdays and weekends. This analysis demonstrates the importance of studying and leveraging clinical presence to improve performance and create more transportable clinical models.
翻译:医学观察数据是病人和保健系统之间复杂互动的结果。抽样过程往往极不规律,本身就是一个信息过程。当使用这些数据来开发预测模型时,这种现象往往被忽视,导致在做法演变时模型的性能和普遍性低于最佳水平。我们建议建立一个多任务经常性神经网络,在生存结果的同时,以临床存在的三个层面 -- -- 即纵向、观察间和缺失过程 -- -- 为模型。在利用MIMIMIC III实验室试验进行的预测任务中,对这三个过程进行明确的模拟表明,与最先进的预测模型(一天的C-索引:0.878)相比,其性能有所改善。更重要的是,拟议的方法更能改变临床存在环境,这表现在周日和周末接受的病人之间的性能比较中。这一分析表明,必须研究和利用临床存在来改进性能和创造更可移动的临床模型。