论文标题
强大的次高斯主要组件分析和无宽度独立的沙滕包装
Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
论文作者
论文摘要
我们为以下基本统计任务开发了两种方法:给定$ d $ d $ d $二维次高斯分布的$ε$否定的$ n $样品集,返回协方差矩阵的大约最高特征向量。我们的第一个鲁棒PCA算法在多项式时间内运行,返回$ 1-O(ε\logε^{ - 1})$ - 近似顶部特征值,并且基于简单的迭代过滤方法。我们的第二个达到近似差的近似因子,在近乎线性的时间内运行,并在温和的光谱间隙假设下进行样品复杂性。这些是第一种多项式时间算法,产生有关损坏的次高斯分布的协方差的非平凡信息,而无需额外的代数结构。 As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + ε)$-approximate solution in $O(p\log(\tfrac{nd}ε)ε^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions).
We develop two methods for the following fundamental statistical task: given an $ε$-corrupted set of $n$ samples from a $d$-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a $1 - O(ε\logε^{-1})$-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-$p$ norm packing semidefinite programs, giving a $(1 + ε)$-approximate solution in $O(p\log(\tfrac{nd}ε)ε^{-1})$ input-sparsity time iterations (where $n$, $d$ are problem dimensions).