最佳正则化可以减轻双重下降

论文标题

最佳正则化可以减轻双重下降

Optimal Regularization Can Mitigate Double Descent

论文作者

Nakkiran, Preetum, Venkat, Prayaag, Kakade, Sham, Ma, Tengyu

论文摘要

最近的经验和理论研究表明，从线性回归到神经网络，许多学习算法都可以具有测试性能，这些测试性能在样本量和模型尺寸（例如样本量和模型尺寸）上是非单调的。这种惊人的现象通常被称为“双重下降”，提出了我们是否需要重新思考当前对概括的理解的问题。在这项工作中，我们研究是否可以通过使用最佳正则化来避免双重变态现象。从理论上讲，我们证明，对于具有各向同性数据分布的某些线性回归模型，最佳调整的$ \ ell_2 $正则化实现了单调测试性能，因为我们会增长样本量或模型尺寸。我们还从经验上证明，最佳调整的$ \ ell_2 $正则化可以减轻包括神经网络在内的更多通用模型的双重下降。我们的结果表明，在适当调整正规化的背景下，研究各种算法的测试风险量表也可能是有益的。

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size. This striking phenomenon, often referred to as "double descent", has raised questions of if we need to re-think our current understanding of generalization. In this work, we study whether the double-descent phenomenon can be avoided by using optimal regularization. Theoretically, we prove that for certain linear regression models with isotropic data distribution, optimally-tuned $\ell_2$ regularization achieves monotonic test performance as we grow either the sample size or the model size. We also demonstrate empirically that optimally-tuned $\ell_2$ regularization can mitigate double descent for more general models, including neural networks. Our results suggest that it may also be informative to study the test risk scalings of various algorithms in the context of appropriately tuned regularization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题