简单可扩展的稀疏K-均通过功能排名聚类

论文标题

简单可扩展的稀疏K-均通过功能排名聚类

Simple and Scalable Sparse k-means Clustering via Feature Ranking

论文作者

Zhang, Zhiyue, Lange, Kenneth, Xu, Jason

论文摘要

众所周知，当特征空间高维时，聚类是无监督学习中的基本活动。幸运的是，在许多现实的情况下，只有少数功能与区分簇相关。这激发了稀疏聚类技术的发展，这些技术通常依赖于高计算复杂性外算法中的K均值。当前技术还需要仔细调整收缩参数，从而进一步限制其可扩展性。在本文中，我们提出了一个新颖的框架，用于稀疏的K-均值聚类，该框架直观，易于实现，并且与最先进的算法具有竞争力。我们表明我们的算法具有一致性和融合保证。我们的核心方法很容易概括到多种特定于任务的算法，例如在属性子集和部分观察到的数据设置中聚类。我们通过模拟实验和实际数据基准彻底展示了这些贡献，包括关于三方小鼠蛋白质表达的案例研究。

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题