MD-Manifold: 基于医学距离的表示学习方法的医学概念和患者表示 (MD-Manifold: A Medical-Distance-Based Representation Learning Approach for Medical Concept and Patient Representation)

Effectively representing medical concepts and patients is important for healthcare analytical applications. Representing medical concepts for healthcare analytical tasks requires incorporating medical domain knowledge and prior information from patient description data. Current methods, such as feature engineering and mapping medical concepts to standardized terminologies, have limitations in capturing the dynamic patterns from patient description data. Other embedding-based methods have difficulties in incorporating important medical domain knowledge and often require a large amount of training data, which may not be feasible for most healthcare systems. Our proposed framework, MD-Manifold, introduces a novel approach to medical concept and patient representation. It includes a new data augmentation approach, concept distance metric, and patient-patient network to incorporate crucial medical domain knowledge and prior data information. It then adapts manifold learning methods to generate medical concept-level representations that accurately reflect medical knowledge and patient-level representations that clearly identify heterogeneous patient cohorts. MD-Manifold also outperforms other state-of-the-art techniques in various downstream healthcare analytical tasks. Our work has significant implications in information systems research in representation learning, knowledge-driven machine learning, and using design science as middle-ground frameworks for downstream explorative and predictive analyses. Practically, MD-Manifold has the potential to create effective and generalizable representations of medical concepts and patients by incorporating medical domain knowledge and prior data information. It enables deeper insights into medical data and facilitates the development of new analytical applications for better healthcare outcomes.

翻译：有效地表示医学概念和患者对于医疗保健分析应用很重要。为医疗保健分析任务表示医学概念需要融合医学领域知识和来自患者描述数据的先前信息。目前的方法，例如特征工程和将医学概念映射到标准术语中，存在捕获来自患者描述数据的动态模式的局限性。其他基于嵌入的方法难以融合重要的医学领域知识,并且通常需要大量的训练数据,这对于大多数医疗保健系统来说可能是不可行的。我们提出的框架 MD-Manifold 引入了一种新的医学概念和患者表示方法。它包括新的数据增强方法,概念距离度量和患者网络来融合重要的医学领域知识和先前数据信息。然后,它使用流形学习方法生成精确反映医学知识的医学概念级别表示和清晰地识别异构患者群体的患者级别表示。MD-Manifold 在各种下游医疗保健分析任务中也优于其他最先进的技术。我们的工作在表示学习、知识驱动的机器学习和使用设计科学作为下游探索和预测分析的中间地带框架的信息系统研究方面具有重要的意义。实际上，MD-Manifold 有潜力通过融合医学领域知识和先前数据信息来创造有效和可推广的医学概念和患者表示。它可以深入洞察医学数据,并有助于开发新的分析应用程序，以实现更好的医疗保健结果。