Early detection of cancers has been much explored due to its paramount importance in biomedical fields. Among different types of data used to answer this biological question, studies based on T cell receptors (TCRs) are under recent spotlight due to the growing appreciation of the roles of the host immunity system in tumor biology. However, the one-to-many correspondence between a patient and multiple TCR sequences hinders researchers from simply adopting classical statistical/machine learning methods. There were recent attempts to model this type of data in the context of multiple instance learning (MIL). Despite the novel application of MIL to cancer detection using TCR sequences and the demonstrated adequate performance in several tumor types, there is still room for improvement, especially for certain cancer types. Furthermore, explainable neural network models are not fully investigated for this application. In this article, we propose multiple instance neural networks based on sparse attention (MINN-SA) to enhance the performance in cancer detection and explainability. The sparse attention structure drops out uninformative instances in each bag, achieving both interpretability and better predictive performance in combination with the skip connection. Our experiments show that MINN-SA yields the highest area under the ROC curve (AUC) scores on average measured across 10 different types of cancers, compared to existing MIL approaches. Moreover, we observe from the estimated attentions that MINN-SA can identify the TCRs that are specific for tumor antigens in the same T cell repertoire.
翻译:由于癌症在生物医学领域的重要性,对癌症的早期检测进行了许多探索。在用来解答这一生物学问题的不同类型数据中,基于T细胞受体(TCRs)的研究最近受到关注,因为主机免疫系统在肿瘤生物学中的作用日益受到重视。然而,病人与多个TCR序列之间的一对多次通信妨碍了研究人员仅仅采用传统的统计/机械学习方法来提高癌症检测和可解释性。最近有人试图在多个实例学习(MIL)中模拟这类数据。尽管在使用TCR序列进行癌症检测时采用了MIL的新应用,并且在若干肿瘤类型中表现出充分的性能,但仍有改进的余地,特别是某些癌症类型。此外,对可解释的神经网络模型没有为这一应用进行充分调查。在本篇文章中,我们建议基于注意力稀少(MIN-SA)的多例神经网络,以提高癌症检测和可解释性能。每包中的不见感知性实例减少,既实现可解释性,又与跳线连接更好地预测性能。我们进行的实验表明,MIN-SA的可辨测测测测算出特定癌症在TRC类中,我们测测测测测测测测测得的癌症的10类中测测测得最高值。