Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification (HMC). The approach uses spectral clustering to extract new features from the gene co-expression network (GCN) and enrich the prediction task. HMC is used to build multiple estimators that consider the hierarchical structure of gene functions. The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world. The results illustrate how in silico approaches are key to reduce the time and costs of gene annotation. More specifically, they highlight the importance of: (i) building new features that represent the structure of gene relationships in GCNs to annotate genes; and (ii) taking into account the structure of biological processes to obtain consistent predictions.
翻译:尽管最近取得了一些进展,但主要依赖活体生物实验的批注程序要求的成本和时间仍然高得令人望而却步。本文件介绍了对批注问题的硅处理方法的新颖,该方法结合了集群分析和等级多标签分类(HMC),利用光谱集群从基因共同表达网络中提取新的特征,丰富了预测任务。HMC用于建立考虑基因功能等级结构的多重估测器。所提议的方法适用于关于Zea 可能(世界上最主要和最有生产力的作物之一)的案例研究。研究结果表明,在批注方法中如何是减少基因批注的时间和成本的关键。更具体地说,它们强调了以下几个方面的重要性:(一) 建立反映基因共同表达网络中的基因关系结构的新的特征,以补充注解基因;以及(二) 考虑到生物过程的结构,以获得一致的预测。