论文标题
连续空间中的合奏增强学习 - 一种政策培训的分层多步骤方法
Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical Multi-Step Approach for Policy Training
论文作者
论文摘要
参与者批判性的深入强化学习(DRL)算法最近在解决各种具有挑战性的强化学习(RL)问题方面取得了成功,尤其是具有高维连续状态和动作空间的复杂控制任务。然而,现有的研究表明,参与者批评的DRL算法通常无法有效地探索他们的学习环境,从而导致学习稳定性和表现有限。为了解决这一限制,最近提出了几种集合DRL算法,以促进探索和稳定学习过程。但是,大多数现有的合奏算法并未明确训练所有基础学习者共同优化合奏的性能。在本文中,我们提出了一种新技术,以创新的多步集成方法来培训基础学习者的合奏。这种培训技术使我们能够为集合DRL开发一种新的层次学习算法,该算法通过稳定的学习者参数共享有效地促进了学习者间的协作。理论上对我们的新算法的设计进行了验证。该算法在经验上也显示出在多个基准RL问题上的表现优于几种最先进的DRL算法。
Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.