MedFACT:通过特征集群学习病人健康代表制学习中医学特征变化的模型化 (MedFACT: Modeling Medical Feature Correlations in Patient Health Representation Learning via Feature Clustering)

In healthcare prediction tasks, it is essential to exploit the correlations between medical features and learn better patient health representations. Existing methods try to estimate feature correlations only from data, or increase the quality of estimation by introducing task-specific medical knowledge. However, such methods either are difficult to estimate the feature correlations due to insufficient training samples, or cannot be generalized to other tasks due to reliance on specific knowledge. There are medical research revealing that not all the medical features are strongly correlated. Thus, to address the issues, we expect to group up strongly correlated features and learn feature correlations in a group-wise manner to reduce the learning complexity without losing generality. In this paper, we propose a general patient health representation learning framework MedFACT. We estimate correlations via measuring similarity between temporal patterns of medical features with kernel methods, and cluster features with strong correlations into groups. The feature group is further formulated as a correlation graph, and we employ graph convolutional networks to conduct group-wise feature interactions for better representation learning. Experiments on two real-world datasets demonstrate the superiority of MedFACT. The discovered medical findings are also confirmed by literature, providing valuable medical insights and explanations.

翻译：在保健预测任务中,必须利用医疗特征之间的相互关系,并学习更好的病人健康表现; 现有方法试图通过引入特定任务的医疗知识,仅根据数据来估计特征的相关性,或提高估计质量; 然而,由于培训样本不足,这些方法难以估计特征的相关性,或者由于依赖特定知识,无法将其推广到其他任务中; 医学研究显示,并非所有医疗特征都密切相关; 因此,为了解决问题,我们期望以群体方式将紧密关联的特征分组,学习特征相关性,以减少学习复杂性,同时又不失去普遍性; 在本文件中,我们提出一个一般病人健康代表学习框架MedFACT。我们通过测量医疗特征与内核方法的时间模式的相似性,以及与群体之间密切关联的集群性,来估计这些方法的相关性。特征组进一步作为相关图表编制,我们使用图表革命网络来进行群体特征互动,以更好地进行代表性学习。对两个真实世界数据集的实验表明MedFACT的优越性。所发现的医学发现的结果也得到了文献的确认,提供了宝贵的医学洞察和解释。