论文标题

一种用于凸双簇的新算法及其扩展到组成数据

A New Algorithm for Convex Biclustering and Its Extension to the Compositional Data

论文作者

Wang, Binhuan, Yao, Lanqiu, Hu, Jiyuan, Li, Huilin

论文摘要

双簇是一种强大的数据挖掘技术,它允许在矩阵格式数据集中同时聚类行(观测值)和列(特征),该数据集可以在广泛的域中以类似棋盘的模式提供类似棋盘的模式的结果。在过去的二十年中,已经开发了多种双簇算法,其中凸倍数可以通过将其作为凸优化问题进行配制来保证全局最佳。另一方面,双簇的应用与算法技术并未同时进行。例如,对于越来越流行的微生物组研究数据的双簇可能是由于其每个样本的组成约束所致。在本手稿中,我们提出了一种基于ADMM算法的一般设置,提出了一种称为BI-ADMM的新凸双簇算法,该算法无需额外的平滑步骤,以可视化现有的CONVEX BICLUSTER算法所需的信息性双群落。此外,我们将其定制为名为BIC-ADMM的算法,专门针对微生物组数据中面临的组成约束。我们方法的关键步骤是利用Sylvester方程来得出ADMM算法,这是聚类研究的新事物。通过多种数值实验和微生物组数据应用程序,检查了所提出方法的有效性。

Biclustering is a powerful data mining technique that allows simultaneously clustering rows (observations) and columns (features) in a matrix-format data set, which can provide results in a checkerboard-like pattern for visualization and exploratory analysis in a wide array of domains. Multiple biclustering algorithms have been developed in the past two decades, among which the convex biclustering can guarantee a global optimum by formulating in as a convex optimization problem. On the other hand, the application of biclustering has not progressed in parallel with the algorithm techniques. For example, biclustering for increasingly popular microbiome research data is under-applied possibly due to its compositional constraints for each sample. In this manuscript, we propose a new convex biclustering algorithm, called the bi-ADMM, under general setups based on the ADMM algorithm, which is free of extra smoothing steps to visualize informative biclusters required by existing convex biclustering algorithms. Furthermore, we tailor it to the algorithm named biC-ADMM specifically to tackle compositional constraints confronted in microbiome data. The key step of our methods utilizes the Sylvester Equation to derive the ADMM algorithm, which is new to the clustering research. The effectiveness of the proposed methods is examined through a variety of numerical experiments and a microbiome data application.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源