论文标题
简单可扩展的稀疏K-均通过功能排名聚类
Simple and Scalable Sparse k-means Clustering via Feature Ranking
论文作者
论文摘要
众所周知,当特征空间高维时,聚类是无监督学习中的基本活动。幸运的是,在许多现实的情况下,只有少数功能与区分簇相关。这激发了稀疏聚类技术的发展,这些技术通常依赖于高计算复杂性外算法中的K均值。当前技术还需要仔细调整收缩参数,从而进一步限制其可扩展性。在本文中,我们提出了一个新颖的框架,用于稀疏的K-均值聚类,该框架直观,易于实现,并且与最先进的算法具有竞争力。我们表明我们的算法具有一致性和融合保证。我们的核心方法很容易概括到多种特定于任务的算法,例如在属性子集和部分观察到的数据设置中聚类。我们通过模拟实验和实际数据基准彻底展示了这些贡献,包括关于三方小鼠蛋白质表达的案例研究。
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.