Cancer subtyping is crucial for understanding the nature of tumors and providing suitable therapy. However, existing labelling methods are medically controversial, and have driven the process of subtyping away from teaching signals. Moreover, cancer genetic expression profiles are high-dimensional, scarce, and have complicated dependence, thereby posing a serious challenge to existing subtyping models for outputting sensible clustering. In this study, we propose a novel clustering method for exploiting genetic expression profiles and distinguishing subtypes in an unsupervised manner. The proposed method adaptively learns categorical correspondence from latent representations of expression profiles to the subtypes output by the model. By maximizing the problem -- agnostic mutual information between input expression profiles and output subtypes, our method can automatically decide a suitable number of subtypes. Through experiments, we demonstrate that our proposed method can refine existing controversial labels, and, by further medical analysis, this refinement is proven to have a high correlation with cancer survival rates.
翻译:癌症亚型对于了解肿瘤性质和提供适当的治疗至关重要。 但是,现有的标签方法在医学上引起争议,并且已经使亚型化过程脱离了教学信号。此外,癌症的遗传表达特征是高维的,稀缺的,具有复杂的依赖性,从而对现有用于输出合理分组的亚型模型构成严重挑战。在这个研究中,我们提出了一个新型的分组方法,用于以不受监督的方式利用遗传表达特征和区分子型。拟议方法适应性地学习了从表达特征的潜在表现到该模型的子型输出的绝对对应。通过最大限度地扩大问题 -- -- 输入表达特征和输出子型之间的不可知性相互信息,我们的方法可以自动决定适当的子型。我们通过实验,证明我们拟议的方法可以完善现有的有争议的标签,并通过进一步医学分析,这一改进证明与癌症存活率有着高度的关联。