Metastatic prostate cancer is one of the most common cancers in men. In the advanced stages of prostate cancer, tumours can metastasise to other tissues in the body, which is fatal. In this thesis, we performed a genetic analysis of prostate cancer tumours at different metastatic sites using data science, machine learning and topological network analysis methods. We presented a general procedure for pre-processing gene expression datasets and pre-filtering significant genes by analytical methods. We then used machine learning models for further key gene filtering and secondary site tumour classification. Finally, we performed gene co-expression network analysis and community detection on samples from different prostate cancer secondary site types. In this work, 13 of the 14,379 genes were selected as the most metastatic prostate cancer related genes, achieving approximately 92% accuracy under cross-validation. In addition, we provide preliminary insights into the co-expression patterns of genes in gene co-expression networks.
翻译:前列腺癌是男性最常见的癌症之一。在前列腺癌的晚期,肿瘤可以转移到身体的其他组织,具有致命性。本论文采用数据科学、机器学习和拓扑网络分析方法对不同转移部位的前列腺癌肿瘤进行基因分析。我们提出了一种通用的基因表达数据预处理程序和分析方法筛选重要基因。然后,我们使用机器学习模型进一步筛选关键基因和次级部位肿瘤分类。最后,我们对来自不同前列腺癌转移部位类型的样本进行基因共表达网络分析和社区检测。在这项工作中,从14,379个基因中选择了13个具有最大关联的前列腺癌基因,在交叉验证下实现了约92%的准确率。此外,我们提供了基因共表达网络中基因的共表达模式的初步洞察。