论文标题
一种有效的异步方法,用于整合基于进化和梯度的策略搜索
An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search
论文作者
论文摘要
深度强化学习(DRL)算法和进化策略(ES)已应用于各种任务,显示出出色的表现。这些具有相反的特性,DRL具有良好的样品效率和稳定性较差,而ES反之亦然。最近,已经尝试将这些算法结合起来,但是这些方法完全依赖于同步更新方案,因此并不理想地在ES中最大化并行性的好处。为了解决这一挑战,引入了异步更新方案,该方案能够进行良好的时间效率和多样化的政策探索。在本文中,我们介绍了一种异步进化策略增强学习(AES-RL),该学习最大化ES的平行效率并将其与策略梯度方法集成在一起。具体而言,我们提出了1)一个新的框架,以使ES和DRL异步合并,以及2)各种异步更新方法,这些方法可以分别具有异步,ES和DRL的所有优势,它们分别是探索和时间效率,稳定性,稳定性,稳定性和样品效率。提出的框架和更新方法是在连续控制基准的工作中评估的,与以前的方法相比,表现出色的性能和时间效率。
Deep reinforcement learning (DRL) algorithms and evolution strategies (ES) have been applied to various tasks, showing excellent performances. These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa. Recently, there have been attempts to combine these algorithms, but these methods fully rely on synchronous update scheme, making it not ideal to maximize the benefits of the parallelism in ES. To solve this challenge, asynchronous update scheme was introduced, which is capable of good time-efficiency and diverse policy exploration. In this paper, we introduce an Asynchronous Evolution Strategy-Reinforcement Learning (AES-RL) that maximizes the parallel efficiency of ES and integrates it with policy gradient methods. Specifically, we propose 1) a novel framework to merge ES and DRL asynchronously and 2) various asynchronous update methods that can take all advantages of asynchronism, ES, and DRL, which are exploration and time efficiency, stability, and sample efficiency, respectively. The proposed framework and update methods are evaluated in continuous control benchmark work, showing superior performance as well as time efficiency compared to the previous methods.