论文标题
通过副本交换Langevin扩散加速非凸学习
Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
论文作者
论文摘要
Langevin扩散是一种非凸优化的强大方法,它可以通过将噪声注入梯度来逃离局部最小值。特别是,控制噪声水平的温度参数会导致``全球探索''和``本地剥削''之间的权衡,这对应于高温和低温。为了达到这两种政权的优势,我们建议使用复制交换,这些复制品交换在两个具有不同温度的Langevin扩散之间进行了交换。从理论上讲,我们从两个角度分析了复制交换的加速效应:(i)χ^2差异的收敛性,以及(ii)较大的偏差原理。这种加速效应使我们能够更快地接近全球最小值。此外,通过离散副本交换兰格文扩散,我们获得了一个离散的时间算法。对于这种算法,我们在理论上量化了其离散误差,并在实践中证明了其加速效应。
Langevin diffusion is a powerful method for nonconvex optimization, which enables the escape from local minima by injecting noise into the gradient. In particular, the temperature parameter controlling the noise level gives rise to a tradeoff between ``global exploration'' and ``local exploitation'', which correspond to high and low temperatures. To attain the advantages of both regimes, we propose to use replica exchange, which swaps between two Langevin diffusions with different temperatures. We theoretically analyze the acceleration effect of replica exchange from two perspectives: (i) the convergence in χ^2-divergence, and (ii) the large deviation principle. Such an acceleration effect allows us to faster approach the global minima. Furthermore, by discretizing the replica exchange Langevin diffusion, we obtain a discrete-time algorithm. For such an algorithm, we quantify its discretization error in theory and demonstrate its acceleration effect in practice.