With the increasingly available large-scale cancer genomics datasets, machine learning approaches have played an important role in revealing novel insights into cancer development. Existing methods have shown encouraging performance in identifying genes that are predictive for cancer survival, but are still limited in modeling the distribution over genes. Here, we proposed a novel method that can simulate the gene expression distribution at any given time point, including those that are out of the range of the observed time points. In order to model the irregular time series where each patient is one observation, we integrated a neural ordinary differential equation (neural ODE) with cox regression into our framework. We evaluated our method on eight cancer types on TCGA and observed a substantial improvement over existing approaches. Our visualization results and further analysis indicate how our method can be used to simulate expression at the early cancer stage, offering the possibility for early cancer identification.
翻译:随着大规模癌症基因组数据集的日益普及,机器学习方法在揭示癌症发展的新洞察力方面发挥了重要的作用。现有方法显示,在确定癌症生存预测的基因方面表现良好,但在基因分布模型方面仍然有限。在这里,我们提出了一个新颖的方法,可以在任何特定时间点模拟基因表达分布,包括超出观察时间点范围的基因表达分布。为了模拟每个病人都是一个观察对象的不规则时间序列,我们把神经普通差异方程式(神经异变方程式)与 Cox回归纳入我们的框架。我们评估了我们在TTCGA上八种癌症类型的方法,发现现有方法有了很大的改进。我们的视觉化结果和进一步分析表明,我们的方法如何能够在早期癌症阶段模拟表达,为早期癌症诊断提供了可能性。