Isoforms are mRNAs produced from the same gene site in the phenomenon called Alternative Splicing. Studies have shown that more than 95% of human multi-exon genes have undergone alternative splicing. Although there are few changes in mRNA sequence, They may have a systematic effect on cell function and regulation. It is widely reported that isoforms of a gene have distinct or even contrasting functions. Most studies have shown that alternative splicing plays a significant role in human health and disease. Despite the wide range of gene function studies, there is little information about isoforms' functionalities. Recently, some computational methods based on Multiple Instance Learning have been proposed to predict isoform function using gene function and gene expression profile. However, their performance is not desirable due to the lack of labeled training data. In addition, probabilistic models such as Conditional Random Field (CRF) have been used to model the relation between isoforms. This project uses all the data and valuable information such as isoform sequences, expression profiles, and gene ontology graphs and proposes a comprehensive model based on Deep Neural Networks. The UniProt Gene Ontology (GO) database is used as a standard reference for gene functions. The NCBI RefSeq database is used for extracting gene and isoform sequences, and the NCBI SRA database is used for expression profile data. Metrics such as Receiver Operating Characteristic Area Under the Curve (ROC AUC) and Precision-Recall Under the Curve (PR AUC) are used to measure the prediction accuracy.
翻译:Isoforms 是来自同一基因站点的 mRNA 。 研究显示, 95%以上的人类多外基因都经过了替代的复制。 虽然 mRNA 序列中的变化不多, 但它们可能对细胞功能和调控产生系统性影响。 广泛报道基因的异形具有不同或甚至对比功能。 大多数研究表明, 替代的复制在人类健康和疾病中起着重要作用 。 尽管基因功能研究范围很广, 但关于异形功能的信息很少。 最近, 以多例精度学习为基础的一些计算方法已经建议使用基因功能和基因表达式配置来预测异形功能。 但是, 由于缺乏标签化的培训数据, 它们的性能并不理想。 此外, 诸如 Conditional 随机场( CRFRF) 等概率模型被用于模拟等色谱之间的关系 。 这个项目使用所有的数据和有价值的信息, 比如, 等异义序列、 表达剖面图, 以多例性剖析图图为基础, 并提议一个基于 深层内基的直径系统数据库 。 使用SProcrecurial Creal Creal 数据库 。