This paper focuses on three critical problems on protein classification. Firstly, Carbohydrate-active enzyme (CAZyme) classification can help people to understand the properties of enzymes. However, one CAZyme may belong to several classes. This leads to Multi-label CAZyme classification. Secondly, to capture information from the secondary structure of protein, protein classification is modeled as graph classification problem. Thirdly, compound-protein interactions prediction employs graph learning for compound with sequential embedding for protein. This can be seen as classification task for compound-protein pairs. This paper proposes three models for protein classification. Firstly, this paper proposes a Multi-label CAZyme classification model using CNN-LSTM with Attention mechanism. Secondly, this paper proposes a variational graph autoencoder based subspace learning model for protein graph classification. Thirdly, this paper proposes graph isomorphism networks (GIN) and Attention-based CNN-LSTM for compound-protein interactions prediction, as well as comparing GIN with graph convolution networks (GCN) and graph attention networks (GAT) in this task. The proposed models are effective for protein classification. Source code and data are available at https://github.com/zshicode/GNN-AttCL-protein. Besides, this repository collects and collates the benchmark datasets with respect to above problems, including CAZyme classification, enzyme protein graph classification, compound-protein interactions prediction, drug-target affinities prediction and drug-drug interactions prediction. Hence, the usage for evaluation by benchmark datasets can be more conveniently.
翻译:本文侧重于蛋白质分类的三个关键问题。 首先,碳水化合物活性酶(CAZyme)分类可以帮助人们理解酶的特性。 但是, CAZyme可能属于多个类别。 这导致多标签CAZyme分类。 其次,为了从蛋白质的二级结构中收集信息,蛋白质分类模式是图解分类问题。 第三, 复合蛋白互动预测用图解学习化合物的图形学习, 并按顺序嵌入蛋白质。 这可以被视为化合物蛋白质配方(CAZyme)的分类任务。 本文提出了三种蛋白质分类模式。 首先,本文建议使用CNNCM-LSTM的多标签CAZyme分类模式。 第二,本文提出了基于蛋白质图解分类的变色图解子学习模型。 本文提议的图解剖图网络(GIN)和基于视线的CN-LSTM用于化合物互动预测, 以及将GIN与图表分类网络(GAT) 和图表关注网络进行分类。 在这项工作中, IMB/CLSBSB/CLCLA 数据库中, 数据库数据库中, 数据是有效的数据库数据库数据库数据库。