Identifying subgroups and properties of cancer biopsy samples is a crucial step towards obtaining precise diagnoses and being able to perform personalized treatment of cancer patients. Recent data collections provide a comprehensive characterization of cancer cell data, including genetic data on copy number alterations (CNAs). We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach that encodes each cancer sample as a persistence diagram of topological features, i.e., high-dimensional voids represented in the data. We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data and demonstrate the viability of some applications on finding substructures in cancer data as well as comparing similarity of cancer types.
翻译:近期的数据收集为癌症细胞数据提供了全面的特征描述,包括复制数改变的遗传数据。我们探索了利用一种新型的基于地形学的方法收集癌症基因组信息的可能性,这种方法将每个癌症样本编码为表层特征的持久性图,即数据中代表的高维空。我们发现,这一技术有可能在癌症基因数据中提取有意义的低维表现,并展示在癌症数据中查找子结构以及比较癌症类型相似性方面的一些应用的可行性。