Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional \emph{black box} method and is based on the well-known statistical theory of \emph{sparse-representation classification} (SRC). Specifically, we create over-complete dictionary matrices by embedding the \emph{composition of the K-spaced amino acid pairs} (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient \emph{matching pursuit} solver instead of the computationally expensive \emph{basis pursuit} solver in this strategy. Furthermore, the \emph{kernel principal component analysis} (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the \emph{synthetic minority oversampling technique} (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and developing new ACPs. The Google-Colab implementation of the proposed method is available at the author's GitHub page (\href{https://github.com/ehtisham-Fazal/ACP-Kernel-SRC}{https://github.com/ehtisham-fazal/ACP-Kernel-SRC}).
翻译:癌症是最具挑战性的疾病之一, 原因是癌症的复杂性、 变异性和原因的多样性。 它在过去几十年中一直是主要研究课题之一, 但仍然没有得到很好的理解。 为此, 多方面的治疗框架是不可或缺的。 最有希望的治疗选项是 emph{ Anticancer peptides} (ACPs) 。 但是它们的大规模识别和合成需要可靠的预测方法, 这仍然是一个问题。 在本文中, 我们提出了一个直观的分类战略, 不同于传统的 kemph{ CP{ black box} 方法, 并且基于众所周知的 emph{sprass- 代表分类的统计理论理论理论。 具体地说, 我们通过嵌入 K- space ampoace 酸配对( CKSAAP) (C), 与传统的 SRC 框架不同, 我们使用一种高效的 emph{ 匹配的追寻求解方法, 而不是计算成本昂贵的 \ emph{ blickal_ 。 此外, 正在使用目前使用的方法来进行 IMex- IMexdeal- dedealal matime 。