The identification of co-regulated genes and their transcription-factor binding sites (TFBS) are the key steps toward understanding transcription regulation. In addition to effective laboratory assays, various bi-clustering algorithms for detection of the co-expressed genes have been developed. Bi-clustering methods are used to discover subgroups of genes with similar expression patterns under to-be-identified subsets of experimental conditions when applied to gene expression data. By building two fuzzy partition matrices of the gene expression data with the Axiomatic Fuzzy Set (AFS) theory, this paper proposes a novel fuzzy bi-clustering algorithm for identification of co-regulated genes. Specifically, the gene expression data is transformed into two fuzzy partition matrices via sub-preference relations theory of AFS at first. One of the matrices is considering the genes as the universe and the conditions as the concept, the other one is considering the genes as the concept and the conditions as the universe. The identification of the co-regulated genes (bi-clusters) is carried out on the two partition matrices at the same time. Then, a novel fuzzy-based similarity criterion is defined based on the partition matrixes, and a cyclic optimization algorithm is designed to discover the significant bi-clusters at expression level. The above procedures guarantee that the generated bi-clusters have more significant expression values than that of extracted by the traditional bi-clustering methods. Finally, the performance of the proposed method is evaluated with the performance of the three well-known bi-clustering algorithms on publicly available real microarray datasets. The experimental results are in agreement with the theoretical analysis and show that the proposed algorithm can effectively detect the co-regulated genes without any prior knowledge of the gene expression data.
翻译:除了有效的实验室检测外,还开发了检测共同表达基因的各种双组式算法。双组式方法用于在基因表达数据应用时发现基因表达模式相似的基因组分组,在基因表达数据应用时,将试验条件的子集作为未知的表达模式。通过用Axiomatic Fuzzy Set(AFS)理论对基因表达数据建立两个模糊的分布式矩阵,本文提出一个新的模糊的双组式计算法,用于识别共同调节基因。具体地说,基因表达数据通过AFS的子组合式关系理论转换成两个模糊的分布式矩阵。一个矩阵将基因视为概念的宇宙和条件,另一个矩阵将基因视为概念和条件作为宇宙。任何共同调节的双组(AFS)理论,本文提出一个新的双组式计算法用于确定共同调节基因组的新的双组式计算法。随后,将基因表达法的精确性能分析方法定义为前基因循环分析的显著的模型,然后将精细化的基因分析方法定义为前组分析。