The claims data, containing medical codes, services information, and incurred expenditure, can be a good resource for estimating an individual's health condition and medical risk level. In this study, we developed Transformer-based Multimodal AutoEncoder (TMAE), an unsupervised learning framework that can learn efficient patient representation by encoding meaningful information from the claims data. TMAE is motivated by the practical needs in healthcare to stratify patients into different risk levels for improving care delivery and management. Compared to previous approaches, TMAE is able to 1) model inpatient, outpatient, and medication claims collectively, 2) handle irregular time intervals between medical events, 3) alleviate the sparsity issue of the rare medical codes, and 4) incorporate medical expenditure information. We trained TMAE using a real-world pediatric claims dataset containing more than 600,000 patients and compared its performance with various approaches in two clustering tasks. Experimental results demonstrate that TMAE has superior performance compared to all baselines. Multiple downstream applications are also conducted to illustrate the effectiveness of our framework. The promising results confirm that the TMAE framework is scalable to large claims data and is able to generate efficient patient embeddings for risk stratification and analysis.
翻译:索赔数据包含医疗编码、服务信息和支出,可以成为评估个人健康状况和医疗风险水平的良好资源。在本研究中,我们开发了基于变异器的多式自动编码多式计算机(TMAE),这是一个无人监督的学习框架,可以通过对索赔数据中有意义的信息进行编码,学习高效的病人代表。TMAE的动机是保健方面的实际需要,将病人分为不同的风险水平,以改善护理的提供和管理。与以往的方法相比,TMAE能够(1) 做住院、门诊和医药索赔的模型,(2) 处理医疗事件之间的不定期间隔时间间隔,(3) 缓解稀有医疗编码的松散问题,(4) 纳入医疗支出信息。我们培训TMAE, 使用真实世界的儿科索赔数据集,包含60多万病人,并用两种组合任务中的方法比较其表现。实验结果表明,TMAE的绩效优于所有基线。还进行了多次下游应用,以说明我们的框架的有效性。有希望的结果证实,TMAE框架可用于大型索赔数据并能够产生有效的病人嵌入风险。