论文标题
良性过度拟合现象的有限样本分析,用于山脊功能估计
A finite sample analysis of the benign overfitting phenomenon for ridge function estimation
论文作者
论文摘要
在大规模机器学习中,最近的广泛数值实验已允许发现相当违反直觉的相变,这是样本量与模型中参数数量之间的比率的函数。随着参数数量$ p $接近样本大小$ n $,概括误差增加,但令人惊讶的是,它开始再次降低阈值$ p = n $。这种现象在\ cite {belkin2019Reconcoling}中引起了理论上的关注,最近对比较深层的神经网络进行了彻底的调查,更具体地是针对更简单的模型,例如,当参数被视为最小量的规范解决方案时,首先是$ p $ p $ p $ p $ p。 \ cite {hastie2019Surprises},最近在有限的维度策略中,更具体地适用于线性模型\ cite {bartlett2020benign},\ cite {tsigler2020ber2020benign},\ cite {lecue2022221gometical}。在本文中,我们提出了\ textIt {ridge}类型非线性模型的有限样本分析,在其中我们研究了\ textIt {估计{估算问题}和\ textit {估算{textit {预测}问题的\ textIt {过度透明度{过度透明度}。我们的结果提供了对最佳估计器与真实参数的距离的精确分析,以及一个概括结合,该估计补充了\ cite {bartlett2020benign}的最新作品和\ cite {Chinot2020benign}。我们的分析基于与连续的Newton方法\ Cite {Neuberger2007-Conlinuule}密切相关的工具,以及对预测最小$ \ ell_2 $ -norm解决方案的性能的精致定量分析。
Recent extensive numerical experiments in high scale machine learning have allowed to uncover a quite counterintuitive phase transition, as a function of the ratio between the sample size and the number of parameters in the model. As the number of parameters $p$ approaches the sample size $n$, the generalisation error increases, but surprisingly, it starts decreasing again past the threshold $p=n$. This phenomenon, brought to the theoretical community attention in \cite{belkin2019reconciling}, has been thoroughly investigated lately, more specifically for simpler models than deep neural networks, such as the linear model when the parameter is taken to be the minimum norm solution to the least-squares problem, firstly in the asymptotic regime when $p$ and $n$ tend to infinity, see e.g. \cite{hastie2019surprises}, and recently in the finite dimensional regime and more specifically for linear models \cite{bartlett2020benign}, \cite{tsigler2020benign}, \cite{lecue2022geometrical}. In the present paper, we propose a finite sample analysis of non-linear models of \textit{ridge} type, where we investigate the \textit{overparametrised regime} of the double descent phenomenon for both the \textit{estimation problem} and the \textit{prediction} problem. Our results provide a precise analysis of the distance of the best estimator from the true parameter as well as a generalisation bound which complements recent works of \cite{bartlett2020benign} and \cite{chinot2020benign}. Our analysis is based on tools closely related to the continuous Newton method \cite{neuberger2007continuous} and a refined quantitative analysis of the performance in prediction of the minimum $\ell_2$-norm solution.