论文标题
一种在线方法,用于具有非凸目标的一系列分配强大的优化
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives
论文作者
论文摘要
在本文中,我们提出了一种实用的在线方法,用于解决一类具有非Convex目标的分布强劲优化(DRO),该方法在机器学习中具有重要的应用,以改善神经网络的鲁棒性。在文献中,大多数求解DRO的方法基于随机原始偶的方法。但是,DRO的原始偶对偶的方法遭受了几个缺点:(1)操纵与数据尺寸相对应的高维二变量的时间昂贵; (2)他们对在线学习依次来依据不友好。为了解决这些问题,我们考虑在双重变量上具有KL差异正则化的一类DRO,将Min-Max问题转换为组成最小化问题,并提出无实际无偶性在线随机方法而不需要大型迷你批量大小。我们建立了有或没有物镜的Polyak-lojasiewicz(PL)条件的提议方法的最新复杂性。关于大规模深度学习任务(i)的经验研究表明,我们的方法可以比基线方法加快培训超过2倍以上,并节省培训时间,并在大规模数据集中使用$ \ sim $ 265K图像加快培训时间,并且(ii)验证DRO对IMPRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRICATIC PRACIMATIC PATIMATIC PATIMATION PRICATIC PRACIMATIC PATIMATION(ERM)的表现(ERM)。具有独立的兴趣,该提出的方法还可以用于解决最新复杂性的随机组成问题家族。
In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the min-max problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-Łojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with $\sim$ 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.