论文标题
随机凸优化中梯度下降方法的信息理论泛化界限的局限性
Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization
论文作者
论文摘要
迄今为止,在随机凸优化的情况下,尚无用于推理概括误差推理的“信息理论”框架。在这项工作中,我们考虑通过几个现有信息理论框架建立此类速率的前景:输入输出互信息界限,条件互信息界限和变体,Pac-Bayes界限以及最近的条件变体。我们证明这些界限都无法建立最小值。然后,我们考虑了研究梯度方法中采用的一种常见策略,在这种方法中,最终迭代被高斯噪声损坏,产生了嘈杂的“代理”算法。我们证明无法通过对此类替代物的分析确定最小值。我们的结果表明,需要新的想法来使用信息理论技术分析梯度下降。
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.