Biclustering is a powerful data mining technique that allows simultaneously clustering rows (observations) and columns (features) in a matrix-format data set, which can provide results in a checkerboard-like pattern for visualization and exploratory analysis in a wide array of domains. Multiple biclustering algorithms have been developed in the past two decades, among which the convex biclustering can guarantee a global optimum by formulating in as a convex optimization problem. On the other hand, the application of biclustering has not progressed in parallel with the algorithm techniques. For example, biclustering for increasingly popular microbiome research data is under-applied possibly due to its compositional constraints for each sample. In this manuscript, we propose a new convex biclustering algorithm, called the bi-ADMM, under general setups based on the ADMM algorithm, which is free of extra smoothing steps to visualize informative biclusters required by existing convex biclustering algorithms. Furthermore, we tailor it to the algorithm named biC-ADMM specifically to tackle compositional constraints confronted in microbiome data. The key step of our methods utilizes the Sylvester Equation to derive the ADMM algorithm, which is new to the clustering research. The effectiveness of the proposed methods is examined through a variety of numerical experiments and a microbiome data application.
翻译:生物集群是一种强大的数据挖掘技术,允许同时将行(观察)和列(特点)组合在一个矩阵-格式数据集中,这可以提供在广泛领域进行可视化和探索分析的象棋板式模式的结果。在过去20年中,已经开发了多双集群算法,其中分母双集群可以保证全球最佳化,办法是将分母优化问题编成一个曲线优化问题。另一方面,双集群的应用没有与算法技术同步进行。例如,对日益流行的微生物研究数据进行双集群,可能由于每个样本的构成限制而未得到充分应用。在这个手稿中,我们提出一种新的螺旋双集群算法,称为双组合算法,在基于ADMM算法的一般设置下,可以保证全球最佳化。对于现有的convex双集群算法所要求的信息性双集群,我们将其应用与称为双集群-ADMM的算法相平行。我们专门用来解决微生物数据中遇到的构成限制的双集群研究数据。我们提出的双集群双集群双集群算法的关键步骤是利用Sylmusmusmus 和Acalal 一种新的研究方法,这是我们研究中的一种方法,通过Sylusmustional 的方法,这是一个通过Syal-vical 和Axbal 的模型研究的模型研究的一种新的方法,这是一种新的方法,这是一个通过一种新的数字的变式的方法。