在最佳加权$ \ ell_2 $正规化线性回归中

论文标题

在最佳加权$ \ ell_2 $正规化线性回归中

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

论文作者

Wu, Denny, Xu, Ji

论文摘要

我们考虑线性模型$ \ mathbf {y} = \ Mathbf {X} \Mathbfβ_\ Star + \ star + \Mathbfε$带有$ \ Mathbf {X} \ in \ Mathbb {r}^r}^{N \ times p} $ in \ mathbb {n \ times p} $ in the parrametialized engimecimized rocime $ p> n $ p> n $ p> n $ p> n $。我们通过概括（加权）脊回归估算$ \MATHBFβ_\ star $ $ $ \mathbfς_w$是加权矩阵。在带有一般数据协方差的随机设计设置下$ \ Mathbb {我们的一般设置导致了许多有趣的发现。我们概述了确定最佳设置$λ_ {\ rm opt} $的精确条件。当$ \ mathbf {x} $和$ \mathbfβ_\ star $都是各向异性时，我们还表征主组件回归（PCR）的双重下降现象。最后，我们确定了无骑手（$λ\至0 $）和最佳正则化（$λ=λ_ {\ rm opt} $）的最佳加权矩阵$ \mathbfς_w$，并证明了与标准脊回归和PCR相比加权目标的优势。

We consider the linear model $\mathbf{y} = \mathbf{X} \mathbfβ_\star + \mathbfε$ with $\mathbf{X}\in \mathbb{R}^{n\times p}$ in the overparameterized regime $p>n$. We estimate $\mathbfβ_\star$ via generalized (weighted) ridge regression: $\hat{\mathbfβ}_λ= \left(\mathbf{X}^T\mathbf{X} + λ\mathbfΣ_w\right)^\dagger \mathbf{X}^T\mathbf{y}$, where $\mathbfΣ_w$ is the weighting matrix. Under a random design setting with general data covariance $\mathbfΣ_x$ and anisotropic prior on the true coefficients $\mathbb{E}\mathbfβ_\star\mathbfβ_\star^T = \mathbfΣ_β$, we provide an exact characterization of the prediction risk $\mathbb{E}(y-\mathbf{x}^T\hat{\mathbfβ}_λ)^2$ in the proportional asymptotic limit $p/n\rightarrow γ\in (1,\infty)$. Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting $λ_{\rm opt}$ for the ridge parameter $λ$ and confirm the implicit $\ell_2$ regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that $λ_{\rm opt}$ can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when both $\mathbf{X}$ and $\mathbfβ_\star$ are anisotropic. Finally, we determine the optimal weighting matrix $\mathbfΣ_w$ for both the ridgeless ($λ\to 0$) and optimally regularized ($λ= λ_{\rm opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题