论文标题
从局部到全局基因共表达使用单细胞RNA-seq数据
From local to global gene co-expression estimation using single-cell RNA-seq data
论文作者
论文摘要
在基因组学研究中,对基因关系的研究通常会带来重要的生物学见解。目前,大型的异质数据集对统计学家构成了新的挑战,因为基因关系通常是局部的。它们从一个样本点变为另一个点,只能存在于样本的一个子集中,并且可能是非线性甚至非单调的。以前的大多数依赖度量并未专门针对局部依赖关系,而确实依赖于计算的依赖关系。在本文中,我们探讨了一种最先进的网络估计技术,该技术以细胞特异性基因网络的名称来表征单细胞水平的基因关系。我们首先表明,在人群中平均细胞特异性基因关系提供了一种新型的单变量依赖度量,可以检测任何非线性的非单调关系。加上一致的非参数估计量,我们在人口和经验水平上都建立了它的鲁棒性。模拟和实际数据分析表明,此措施在各种任务上都超过了现有的独立度量,例如Pearson,Kendall的$τ$,$τ^\ Star $,距离相关性,HSIC,HOFEFFDING的D,HHG和MIC。
In genomics studies, the investigation of the gene relationship often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be non-linear or even non-monotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single-cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure that can detect any non-linear, non-monotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Simulations and real data analysis show that this measure outperforms existing independence measures like Pearson, Kendall's $τ$, $τ^\star$, distance correlation, HSIC, Hoeffding's D, HHG, and MIC, on various tasks.