带有内核中内嵌嵌入的可解释的单细胞组分类 (Interpretable Single-Cell Set Classification with Kernel Mean Embeddings)

Modern single-cell flow and mass cytometry technologies measure the expression of several proteins of the individual cells within a blood or tissue sample. Each profiled biological sample is thus represented by a set of hundreds of thousands of multidimensional cell feature vectors, which incurs a high computational cost to predict each biological sample's associated phenotype with machine learning models. Such a large set cardinality also limits the interpretability of machine learning models due to the difficulty in tracking how each individual cell influences the ultimate prediction. Using Kernel Mean Embedding to encode the cellular landscape of each profiled biological sample, we can train a simple linear classifier and achieve state-of-the-art classification accuracy on 3 flow and mass cytometry datasets. Our model contains few parameters but still performs similarly to deep learning models with millions of parameters. In contrast with deep learning approaches, the linearity and sub-selection step of our model make it easy to interpret classification results. Clustering analysis further shows that our method admits rich biological interpretability for linking cellular heterogeneity to clinical phenotype.

翻译：现代单细胞流动和质量细胞测量技术测量了血液或组织样本中个别细胞若干蛋白的表达方式。因此,每个剖面生物样本都由数以十万计的多维细胞特性矢量组成,这给预测每个生物样本与机器学习模型相关的苯型带来了很高的计算成本。这种庞大的设定基点还限制了机器学习模型的可解释性,因为难以跟踪每个细胞如何影响最终预测。利用内核嵌入来编码每个剖面生物样本的细胞景观,我们可以训练一个简单的线性分类器,在3个流动和质量细胞测量数据集上达到最先进的分类精确度。我们的模型包含一些参数,但仍然与数百万参数的深度学习模型类似。与深层学习方法相比,我们模型的内置性和次选步骤更容易解释分类结果。分组分析进一步表明,我们的方法承认将细胞异性与临床细胞型计算机类型联系起来的丰富生物解释性。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日