一种在线方法，用于具有非凸目标的一系列分配强大的优化

论文标题

一种在线方法，用于具有非凸目标的一系列分配强大的优化

An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives

论文作者

Qi, Qi, Guo, Zhishuai, Xu, Yi, Jin, Rong, Yang, Tianbao

论文摘要

在本文中，我们提出了一种实用的在线方法，用于解决一类具有非Convex目标的分布强劲优化（DRO），该方法在机器学习中具有重要的应用，以改善神经网络的鲁棒性。在文献中，大多数求解DRO的方法基于随机原始偶的方法。但是，DRO的原始偶对偶的方法遭受了几个缺点：（1）操纵与数据尺寸相对应的高维二变量的时间昂贵；（2）他们对在线学习依次来依据不友好。为了解决这些问题，我们考虑在双重变量上具有KL差异正则化的一类DRO，将Min-Max问题转换为组成最小化问题，并提出无实际无偶性在线随机方法而不需要大型迷你批量大小。我们建立了有或没有物镜的Polyak-lojasiewicz（PL）条件的提议方法的最新复杂性。关于大规模深度学习任务（i）的经验研究表明，我们的方法可以比基线方法加快培训超过2倍以上，并节省培训时间，并在大规模数据集中使用$ \ sim $ 265K图像加快培训时间，并且（ii）验证DRO对IMPRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRACIMATIC PATIMATIC PATIMATION PRICATIC PRACIMATIC PATIMATION（ERM）的表现（ERM）。具有独立的兴趣，该提出的方法还可以用于解决最新复杂性的随机组成问题家族。

In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the min-max problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-Łojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with $\sim$ 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题