论文标题
通过坐标梯度下降的强大监督学习
Robust supervised learning with coordinate gradient descent
论文作者
论文摘要
本文以线性方法和标签都可以损坏,要么以重型尾部数据和/或损坏的行进行损坏,以线性方法的方式考虑了监督学习的问题。我们引入了坐标梯度下降作为学习算法的组合以及部分衍生物的稳健估计器。这导致了稳健的统计学习方法,这些方法具有基于经验风险最小化的数值复杂性几乎与非舒适相同的复杂性。主要想法很简单:虽然具有梯度下降的鲁棒学习需要稳健估算整个梯度以更新所有参数的计算成本,但可以使用坐标梯度下降中单个部分导数的鲁棒估计器立即更新参数。我们证明了从这个想法得出的算法的概括误差上的上限,该算法控制了风险的优化和统计误差,并以有或没有强烈的凸度假设。最后,我们在名为Linlearn的新python库中提出了这种方法的有效实施,并通过广泛的数值实验证明我们的方法在此问题的鲁棒性,统计性能和数值效率之间引入了新的有趣折衷。
This paper considers the problem of supervised learning with linear methods when both features and labels can be corrupted, either in the form of heavy tailed data and/or corrupted rows. We introduce a combination of coordinate gradient descent as a learning algorithm together with robust estimators of the partial derivatives. This leads to robust statistical learning methods that have a numerical complexity nearly identical to non-robust ones based on empirical risk minimization. The main idea is simple: while robust learning with gradient descent requires the computational cost of robustly estimating the whole gradient to update all parameters, a parameter can be updated immediately using a robust estimator of a single partial derivative in coordinate gradient descent. We prove upper bounds on the generalization error of the algorithms derived from this idea, that control both the optimization and statistical errors with and without a strong convexity assumption of the risk. Finally, we propose an efficient implementation of this approach in a new python library called linlearn, and demonstrate through extensive numerical experiments that our approach introduces a new interesting compromise between robustness, statistical performance and numerical efficiency for this problem.