大规模随机块模型中的随机光谱聚类

论文标题

大规模随机块模型中的随机光谱聚类

Randomized Spectral Clustering in Large-Scale Stochastic Block Models

论文作者

Zhang, Hai, Guo, Xiao, Chang, Xiangyu

论文摘要

光谱聚类一直是网络中社区检测的广泛使用方法之一。但是，大规模网络为其中的特征值分解带来了计算挑战。在本文中，我们从统计角度研究了使用随机素描算法研究光谱聚类，我们通常假设网络数据是从不一定是完整等级的随机块模型中生成的。为此，我们首先使用最近开发的素描算法来获得两种随机光谱群集算法，即基于随机投影和基于随机抽样的光谱群集。然后，我们根据人口邻接矩阵的近似误差，错误分类误差以及链接概率矩阵的估计误差研究所得算法的理论界限。事实证明，在温和条件下，随机光谱聚类算法与原始光谱聚类算法的理论界限相同。我们还将结果扩展到了经度校正的随机块模型。数值实验支持我们的理论发现，并显示随机方法的效率。开发了一个名为rclust的新R软件包，并提供给公众。

Spectral clustering has been one of the widely used methods for community detection in networks. However, large-scale networks bring computational challenges to the eigenvalue decomposition therein. In this paper, we study the spectral clustering using randomized sketching algorithms from a statistical perspective, where we typically assume the network data are generated from a stochastic block model that is not necessarily of full rank. To do this, we first use the recently developed sketching algorithms to obtain two randomized spectral clustering algorithms, namely, the random projection-based and the random sampling-based spectral clustering. Then we study the theoretical bounds of the resulting algorithms in terms of the approximation error for the population adjacency matrix, the misclassification error, and the estimation error for the link probability matrix. It turns out that, under mild conditions, the randomized spectral clustering algorithms lead to the same theoretical bounds as those of the original spectral clustering algorithm. We also extend the results to degree-corrected stochastic block models. Numerical experiments support our theoretical findings and show the efficiency of randomized methods. A new R package called Rclust is developed and made available to the public.

下载PDF全文

下载文献需遵守相关版权规定

论文标题