通过缩放梯度下降加速不良条件的低级矩阵估计

论文标题

通过缩放梯度下降加速不良条件的低级矩阵估计

Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

论文作者

Tong, Tian, Ma, Cong, Chi, Yuejie

论文摘要

低级矩阵估计是一个规范的问题，可以在信号处理，机器学习和成像科学中找到许多应用。实践中一种流行的方法是将矩阵分解为两个紧凑的低级别因素，然后通过简单的迭代方法（例如梯度下降和交替的最小化）直接优化这些因素。尽管有非概念性，但最近的文献表明，这些简单的启发式方法实际上是在适当初始化的越来越多的感兴趣问题时实现线性收敛的。但是，经过仔细检查后，现有方法仍然可以在计算上昂贵，尤其是对于不良条件的矩阵：梯度下降的收敛速率在线性地依赖于低级数矩阵的状况数量，而交替的最小化的每介于每卷的每卷成本通常对大型矩阵而言是过时的。本文的目的是制定一种竞争性算法方法，称为缩放梯度下降（ScaledGD），可以将其视为预先条件或对角线缩放的梯度下降，其中预先条件是自适应的，并且具有最小的计算上空的高空上的迭代变化。通过针对低级矩阵传感，可靠的主成分分析和矩阵完成的量身定制的变体，我们从理论上表明scaledGD达到了两全其美的最佳：它以与低率基质的条件相似的速率数量的速率收敛，同时保持低级矩阵的低级矩阵，同时维持低级别的渐变成本。我们的分析还适用于严格凸起并在低级矩阵上平滑的一般损失函数。据我们所知，ScaleDGD是第一种算法，证明具有在广泛的低级矩阵估计任务中具有此类属性。

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and then optimize these factors directly via simple iterative methods such as gradient descent and alternating minimization. Despite nonconvexity, recent literatures have shown that these simple heuristics in fact achieve linear convergence when initialized properly for a growing number of problems of interest. However, upon closer examination, existing approaches can still be computationally expensive especially for ill-conditioned matrices: the convergence rate of gradient descent depends linearly on the condition number of the low-rank matrix, while the per-iteration cost of alternating minimization is often prohibitive for large matrices. The goal of this paper is to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent (ScaledGD) which can be viewed as pre-conditioned or diagonally-scaled gradient descent, where the pre-conditioners are adaptive and iteration-varying with a minimal computational overhead. With tailored variants for low-rank matrix sensing, robust principal component analysis and matrix completion, we theoretically show that ScaledGD achieves the best of both worlds: it converges linearly at a rate independent of the condition number of the low-rank matrix similar as alternating minimization, while maintaining the low per-iteration cost of gradient descent. Our analysis is also applicable to general loss functions that are restricted strongly convex and smooth over low-rank matrices. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties over a wide range of low-rank matrix estimation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题