PAGE-PG：一种简单而无环的降低策略梯度方法，具有概率梯度估计

论文标题

PAGE-PG：一种简单而无环的降低策略梯度方法，具有概率梯度估计

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

论文作者

Gargiani, Matilde, Zanelli, Andrea, Martinelli, Andrea, Summers, Tyler, Lygeros, John

论文摘要

尽管他们成功了，但政策梯度方法却遭受了梯度估计值的较高差异，这可能会导致样本复杂性不令人满意。最近，已经提出了许多差异降低的策略梯度方法的扩展，并提出了更高的样本复杂性和竞争性数值性能。在对一些主要方差降低增强型方法的紧凑调查之后，我们提出了对策略梯度（PAGE-PG）的概率梯度估计，这是一种基于两种类型更新之间的概率开关的新型无环体方差降低策略梯度方法。我们的方法的灵感来自监督学习的页面估计器，并利用重要性抽样以获得公正的梯度估计器。我们表明，Page-PG享受$ \ Mathcal {o} \ left（ε^{ - 3} \右）$平均样本复杂性，以达到$ε$ -Stationary解决方案，该解决方案与同一设置下最有竞争力的对应物的样本复杂性相匹配。数值评估证实了我们方法在经典控制任务上的竞争性能。

Despite their success, policy gradient methods suffer from high variance of the gradient estimate, which can result in unsatisfactory sample complexity. Recently, numerous variance-reduced extensions of policy gradient methods with provably better sample complexity and competitive numerical performance have been proposed. After a compact survey on some of the main variance-reduced REINFORCE-type methods, we propose ProbAbilistic Gradient Estimation for Policy Gradient (PAGE-PG), a novel loopless variance-reduced policy gradient method based on a probabilistic switch between two types of updates. Our method is inspired by the PAGE estimator for supervised learning and leverages importance sampling to obtain an unbiased gradient estimator. We show that PAGE-PG enjoys a $\mathcal{O}\left( ε^{-3} \right)$ average sample complexity to reach an $ε$-stationary solution, which matches the sample complexity of its most competitive counterparts under the same setting. A numerical evaluation confirms the competitive performance of our method on classical control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题