Family history is considered a risk factor for many diseases because it implicitly captures shared genetic, environmental and lifestyle factors. A nationwide electronic health record (EHR) system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. In this work we present a graph-based deep learning approach for learning explainable, supervised representations of how each family member's longitudinal medical history influences a patient's disease risk. We demonstrate that this approach is beneficial for predicting 10-year disease onset for 5 complex disease phenotypes, compared to clinically-inspired and deep learning baselines for a nationwide EHR system comprising 7 million individuals with up to third-degree relatives. Through the use of graph explainability techniques, we illustrate that a graph-based approach enables more personalized modeling of family information and disease risk by identifying important relatives and features for prediction.
翻译:家族史被认为是许多疾病的风险因素,因为它隐含地捕获了共享的遗传、环境和生活方式因素。跨越多代的全国范围内的电子病历系统为研究整个家族的连接医学历史网络提供了新机会。在本研究中,我们提出了一种基于图表达式学习的深度学习方法,用于学习每个家庭成员的纵向医疗历史如何影响患者疾病风险的可解释监督表示方法。我们证明了这种方法对于预测涵盖7百万人,包括三代亲戚的全国性电子病历系统中五种复杂疾病表型的10年疾病发病有益,与临床启发式和深度学习基线相比。通过使用图形可解释性技术,我们说明基于图的方法通过识别重要的亲戚和特征来实现家族信息和疾病风险的更个性化建模。