Bi-clustering is a technique that allows for the simultaneous clustering of observations and features in a dataset. This technique is often used in bioinformatics, text mining, and time series analysis. An important advantage of biclustering algorithm is the ability to uncover multiple ``views'' (i.e., through rows and column groupings) in the data. Several Gaussian mixture model based biclustering approach currently exist in the literature. However, they impose severe restrictions on the structure of the covariance matrix. Here, we propose a Gaussian mixture model-based bi-clustering approach that provides a more flexible block-diagonal covariance structure. We show that the clustering accuracy of the proposed model is comparable to other known techniques but our approach provides a more flexible covariance structure and has substantially lower computational time. We demonstrate the application of the proposed model in bioinformatics and topic modelling.
翻译:生物信息学、文本挖掘和时间序列分析经常使用这种技术。双组算法的一个重要优点是能够发现数据中的多重“视图”(即通过行和列组) 。文献中目前存在若干基于高斯混合模型的双组法,但是它们严重限制了共变矩阵的结构。在这里,我们提议采用高斯混合模型双组法,提供更灵活的块形对立共变结构。我们表明,拟议模型的组合准确性与其他已知技术相似,但我们的方法提供了更灵活的共变结构,并大大缩短了计算时间。我们展示了拟议模型在生物信息学和专题建模中的应用情况。