Cancer is responsible for millions of deaths worldwide every year. Although significant progress hasbeen achieved in cancer medicine, many issues remain to be addressed for improving cancer therapy.Appropriate cancer patient stratification is the prerequisite for selecting appropriate treatment plan, ascancer patients are of known heterogeneous genetic make-ups and phenotypic differences. In thisstudy, built upon deep phenotypic characterizations extractable from Mayo Clinic electronic healthrecords (EHRs) and genetic test reports for a collection of cancer patients, we evaluated variousgraph neural networks (GNNs) leveraging a joint of phenotypic and genetic features for cancer typeclassification. Models were applied and fine-tuned on the Mayo Clinic cancer disease dataset. Theassessment was done through the reported accuracy, precision, recall, and F1 values as well as throughF1 scores based on the disease class. Per our evaluation results, GNNs on average outperformed thebaseline models with mean statistics always being higher that those of the baseline models (0.849 vs0.772 for accuracy, 0.858 vs 0.794 for precision, 0.843 vs 0.759 for recall, and 0.843 vs 0.855 for F1score). Among GNNs, ChebNet, GraphSAGE, and TAGCN showed the best performance, while GATshowed the worst. We applied and compared eight GNN models including AGNN, ChebNet, GAT,GCN, GIN, GraphSAGE, SGC, and TAGCN on the Mayo Clinic cancer disease dataset and assessedtheir performance as well as compared them with each other and with more conventional machinelearning models such as decision tree, gradient boosting, multi-layer perceptron, naive bayes, andrandom forest which we used as the baselines.
翻译:虽然癌症医学取得了显著进展,但仍有许多问题有待解决,以改善癌症治疗。 癌症患者的分层化是选择适当治疗计划的先决条件,癌症患者有已知的遗传化和口腔差异。在本研究中,基于从Mayo诊所电子健康记录(EHRs)和癌症患者收集工作遗传测试报告中提取的深层口腔特征分析,我们评估了多种神经网络网络(GNNS),在癌症类型分类方面利用了口腔和遗传特征的结合。在马约诊所癌症疾病诊断数据集中应用了模型并加以调整。评估是通过报告的准确性、精确性、回顾和F1值以及基于疾病分类的F1分数进行的。根据我们的评估结果,GNNF平均比基线模型(EHRs)和基底模型(GNNNNS)的比值要高,我们基线模型(0.849和0.772)的精确性能、0.784和0.794的比较性能、0.843和0.8GANS(G)的比值数据,我们用的是常规GAGAG的比值数据,我们的数据,我们用的是GGAG的比0.8和GAG的比值,以及GAGA的比的比的比GGGGAG的比值。