While several Gaussian mixture models-based biclustering approaches currently exist in the literature for continuous data, approaches to handle discrete data have not been well researched. A multivariate Poisson-lognormal (MPLN) model-based bi-clustering approach that utilizes a block-diagonal covariance structure is introduced to allow for a more flexible structure of the covariance matrix. Two variations of the algorithm are developed where the number of column clusters: 1) are assumed equal across groups or 2) can vary across groups. Variational Gaussian approximation is utilized for parameter estimation, and information criteria are used for model selection. The proposed models are investigated in the context of clustering multivariate count data. Using simulated data the models display strong accuracy and computational efficiency and is applied to breast cancer RNA-sequence data from The Cancer Genome Atlas.
翻译:暂无翻译