论文标题
通过PCA压缩率及其应用于单细胞RNA-seq分析的自信聚类
Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis
论文作者
论文摘要
向量的无监督聚类算法已在机器学习领域广泛使用。许多应用程序,包括我们在本文中研究的生物学数据,都包含一些边界数据点,这些数据显示了两个基础簇的组合属性,并且可以降低传统聚类算法的性能。我们开发了一种自信的聚类方法,旨在减少这些数据点的影响并改善聚类结果。具体来说,对于数据点列表,我们给出了两个聚类结果。第一轮聚类试图仅以高信心对纯向量进行分类。基于它,我们对第二轮对更多的向量进行了分类。我们验证了单细胞RNA-seq数据的算法,该算法是生物学领域中强大且广泛使用的工具。我们自信的聚类在我们测试的数据集上表现出很高的精度。此外,与单细胞分析中的传统聚类方法不同,自信聚类在不同选择的参数选择下显示出很高的稳定性。
Unsupervised clustering algorithms for vectors has been widely used in the area of machine learning. Many applications, including the biological data we studied in this paper, contain some boundary datapoints which show combination properties of two underlying clusters and could lower the performance of the traditional clustering algorithms. We develop a confident clustering method aiming to diminish the influence of these datapoints and improve the clustering results. Concretely, for a list of datapoints, we give two clustering results. The first-round clustering attempts to classify only pure vectors with high confidence. Based on it, we classify more vectors with less confidence in the second round. We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area. Our confident clustering shows a high accuracy on our tested datasets. In addition, unlike traditional clustering methods in single-cell analysis, the confident clustering shows high stability under different choices of parameters.