The spatial transcriptomics (ST) data produced by recent biotechnologies, such as CosMx and Xenium, contain huge amount of information about cancer tissue samples, which has great potential for cancer research via detection of community: a collection of cells with distinct cell-type composition and similar neighboring patterns. But existing clustering methods do not work well for community detection of CosMx ST data, and the commonly used kNN compositional data method shows lack of informative neighboring cell patterns for huge CosMx data. In this article, we propose a novel and more informative disk compositional data (DCD) method, which identifies neighboring patterns of each cell via taking into account of ST data features from recent new technologies. After initial processing ST data into DCD matrix, a new innovative and interpretable DCD-TMHC community detection method is proposed here. Extensive simulation studies and CosMx breast cancer data analysis clearly show that our proposed DCD-TMHC method is superior to other methods. Based on the communities detected by DCD-TMHC method for CosMx breast cancer data, the logistic regression analysis results demonstrate that DCD-TMHC method is clearly interpretable and superior, especially in terms of assessment for different stages of cancer. These suggest that our proposed novel, innovative, informative and interpretable DCD-TMHC method here will be helpful and have impact to future cancer research based on ST data, which can improve cancer diagnosis and monitor cancer treatment progress.
翻译:暂无翻译