Biclustering is a powerful data mining technique that allows simultaneously clustering rows (observations) and columns (features) in a matrix-format data set, which can provide results in a checkerboard-like pattern for visualization and exploratory analysis in a wide array of domains. Multiple biclustering algorithms have been developed in the past two decades, among which the convex biclustering can guarantee a global optimum by formulating in as a convex optimization problem. On the other hand, the application of biclustering has not progressed in parallel with the algorithm design. For example, biclustering for increasingly popular microbiome research data is under-applied, and one reason may be its compositional constraints. In this manuscript, we propose a new convex biclustering algorithm, called the bi-ADMM, under general setups based on the ADMM algorithm, which is free of extra smoothing steps to obtain informative biclusters required by existing convex biclustering algorithms. Furthermore, we tailor it to the algorithm named biC-ADMM specifically to tackle compositional constraints confronted in microbiome data. The key step of our methods utilizes the Sylvester Equation, which is new to the clustering research. The effectiveness of the proposed methods is examined through a variety of numerical experiments and a microbiome data application.
翻译:生物集群是一种强大的数据挖掘技术,它允许同时将行(观察)和列(特征)组合在矩阵-格式数据集中,这可以为广泛领域的可视化和探索性分析提供像棋盘般的格子板式模式的结果。在过去20年中,已经开发了多双集群算法,其中,分母双集群可以通过将分母优化问题编成一个曲线优化问题来保证全球最佳利用。另一方面,双集群的应用没有与算法设计同步进行。例如,对日益流行的微生物研究数据进行双集群利用不足,其中一个原因可能是其构成限制。在这个手稿中,我们提出了一种新的螺旋双集群算法,称为双集群算法,在基于ADMM算法的一般设置法下称为双集群算法,可以保证全球最佳利用最佳的双集群。另外,我们根据名为双集群的计算法调整了双集群的应用,具体用来解决微生物数据中面临的构成限制。我们研究的主要步骤是利用Sveyl 模型研究的一种方法。