重新涂漆：深入强化学习中的知识转移

论文标题

重新涂漆：深入强化学习中的知识转移

REPAINT: Knowledge Transfer in Deep Reinforcement Learning

论文作者

Tao, Yunzhe, Genc, Sahika, Chung, Jonathan, Sun, Tao, Mallya, Sunil

论文摘要

通过利用先前学习的任务来加速复杂任务的学习过程一直是强化学习中最具挑战性的问题之一，尤其是当源和目标任务之间的相似性较低时。这项工作提出了代表和实例转移（重新粉刷）算法，用于深入强化学习中的知识转移。重新启动不仅会在政治学习中转移预训练的教师政策，而且还使用基于优势的经验选择方法来转移按照政策学习中的教师政策收集的有用样本。我们对几个基准任务的实验结果表明，重新启动大大减少了任务相似性的通用案例中的总培训时间。特别是，当源任务与目标任务的或子任务不同或子任务时，重新粉刷在训练时间降低和渐近性能的回报分数方面都优于其他基准。

Accelerating learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low. This work proposes REPresentation And INstance Transfer (REPAINT) algorithm for knowledge transfer in deep reinforcement learning. REPAINT not only transfers the representation of a pre-trained teacher policy in the on-policy learning, but also uses an advantage-based experience selection approach to transfer useful samples collected following the teacher policy in the off-policy learning. Our experimental results on several benchmark tasks show that REPAINT significantly reduces the total training time in generic cases of task similarity. In particular, when the source tasks are dissimilar to, or sub-tasks of, the target tasks, REPAINT outperforms other baselines in both training-time reduction and asymptotic performance of return scores.

下载PDF全文

下载文献需遵守相关版权规定

论文标题