双随机子空间聚类

论文标题

双随机子空间聚类

Doubly Stochastic Subspace Clustering

论文作者

Lim, Derek, Vidal, René, Haeffele, Benjamin D.

论文摘要

许多最先进的子空间聚类方法遵循两步过程，首先在数据点之间构造亲和力矩阵，然后将光谱群集应用于此亲和力。对这些方法的大多数研究都集中在产生亲和力的第一步，该步骤通常利用线性子空间的自我表达属性，通常几乎没有考虑到产生最终聚类的光谱聚类步骤。此外，现有方法通常通过将临时选择或任意选择的后处理步骤应用于自我表达聚类公式产生的亲和力，从而获得光谱聚类步骤中使用的最终亲和力，这可能会对整体聚类性能产生重大影响。在这项工作中，我们通过学习数据的自我表达表示和一个与光谱聚类良好均衡的亲和力矩阵来统一这两个步骤。在我们提出的模型中，我们将亲和力矩阵限制为双重随机，这导致了一种原则性的亲和力矩阵归一化方法，同时还利用了光谱群集中双随机归一化的已知益处。我们开发了一个通用框架并得出了两个模型：一个共同学习自我表达的表示以及双重随机亲和力，而一个依次解决一个然后是另一个模型。此外，我们利用问题中利用稀疏性为顺序求解器开发快速积分的方法，该方法可以在大型数据集上有效计算。实验表明，我们的方法在计算机视觉中的许多常见数据集上实现了最新的子空间聚类性能。

Many state-of-the-art subspace clustering methods follow a two-step process by first constructing an affinity matrix between data points and then applying spectral clustering to this affinity. Most of the research into these methods focuses on the first step of generating the affinity, which often exploits the self-expressive property of linear subspaces, with little consideration typically given to the spectral clustering step that produces the final clustering. Moreover, existing methods often obtain the final affinity that is used in the spectral clustering step by applying ad-hoc or arbitrarily chosen postprocessing steps to the affinity generated by a self-expressive clustering formulation, which can have a significant impact on the overall clustering performance. In this work, we unify these two steps by learning both a self-expressive representation of the data and an affinity matrix that is well-normalized for spectral clustering. In our proposed models, we constrain the affinity matrix to be doubly stochastic, which results in a principled method for affinity matrix normalization while also exploiting known benefits of doubly stochastic normalization in spectral clustering. We develop a general framework and derive two models: one that jointly learns the self-expressive representation along with the doubly stochastic affinity, and one that sequentially solves for one then the other. Furthermore, we leverage sparsity in the problem to develop a fast active-set method for the sequential solver that enables efficient computation on large datasets. Experiments show that our method achieves state-of-the-art subspace clustering performance on many common datasets in computer vision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题