In The Cancer Genome Atlas (TCGA) dataset, there are many interesting nonlinear dependencies between pairs of genes that reveal important relationships and subtypes of cancer. Such genomic data analysis requires a rapid, powerful and interpretable detection process, especially in a high-dimensional environment. We study the nonlinear patterns among the expression of genes from TCGA using a powerful tool called Binary Expansion Testing. We find many nonlinear patterns, some of which are driven by known cancer subtypes, some of which are novel.
翻译:在癌症基因组图集(TCGA)的数据集中,在显示重要关系和子癌症类型的各种基因之间有许多令人感兴趣的非线性依赖性。这种基因组数据分析需要快速、有力和可解释的检测过程,特别是在高维环境中。我们使用一个名为二进制扩展测试的强大工具研究TCGA基因表达的非线性模式。我们发现许多非线性模式,其中一些由已知癌症子类型驱动,有些是新奇的。