论文标题
分数山脊回归:山脊回归的快速,可解释的重新聚集
Fractional ridge regression: a fast, interpretable reparameterization of ridge regression
论文作者
论文摘要
脊回归(RR)是一种正则化技术,可惩罚线性回归中系数的L2-标准。使用RR的挑战之一是需要设置控制正规化量的超参数($α$)。交叉验证通常用于从一组候选人中选择最佳的$α$。但是,有效且适当的$α$选择可能具有挑战性,尤其是在分析大量数据的情况下。由于所选$α$取决于数据和预测变量的规模,因此它不可直接解释。在这里,我们建议根据正规化和未注册的系数的L2-Norms之间的比率$γ$对RR进行重新聚集。这种称为分数RR(FRR)的方法具有多个好处:保证针对不同$γ$获得的解决方案可以改变,防止浪费的计算,并自动跨越相关的正则化范围,避免需要艰苦的手动探索。我们提供了一种算法来解决FRR,以及Python和Matlab(https://github.com/nrdg/fracridge)中的开源软件实现。我们表明,提出的方法对于大规模数据问题是快速且可扩展的,并且提供了可以直接解释和比较模型和数据集的结果。
Ridge regression (RR) is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using RR is the need to set a hyperparameter ($α$) that controls the amount of regularization. Cross-validation is typically used to select the best $α$ from a set of candidates. However, efficient and appropriate selection of $α$ can be challenging, particularly where large amounts of data are analyzed. Because the selected $α$ depends on the scale of the data and predictors, it is not straightforwardly interpretable. Here, we propose to reparameterize RR in terms of the ratio $γ$ between the L2-norms of the regularized and unregularized coefficients. This approach, called fractional RR (FRR), has several benefits: the solutions obtained for different $γ$ are guaranteed to vary, guarding against wasted calculations, and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. We provide an algorithm to solve FRR, as well as open-source software implementations in Python and MATLAB (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems, and delivers results that are straightforward to interpret and compare across models and datasets.