论文标题
pan-omics Pan-Canter分析的二维链接矩阵分解
Bidimensional linked matrix factorization for pan-omics pan-cancer analysis
论文作者
论文摘要
几个现代应用需要集成具有共享行和/或列的多个大数据矩阵。例如,将多种类型癌症(Pan-omics Pan-Canter分析)整合的癌症研究扩展到了我们对分子异质性的了解,超出了单个肿瘤和单个平台研究中观察到的知识。但是,这些研究受到可用统计方法的限制。我们提出了一种灵活的方法,以同时分解和分解这种双偶相关的矩阵Bidifac+的变异方法。这将变化分解为一系列低级别组件,这些组件可能会在任何数量的行集(例如OMICS平台)或列集(例如癌症类型)上共享。这是基于越来越多的文献来分解和分解链接矩阵,该矩阵主要集中在多个仅在一个维度(行或列)中链接的矩阵。我们的目标函数扩展了核规范惩罚,是由随机矩阵理论激励的,在相对温和的条件下给出了可识别的分解,并且可以证明给出贝叶斯后分布的模式。我们将BidiFAC+应用于TCGA的Pan-omics Pan-Cancer数据,从4个不同的OMICS平台和29种不同的癌症类型中识别共享和特定的可变性模式。
Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.