学生发起的行动通过咨询新颖性建议

论文标题

学生发起的行动通过咨询新颖性建议

Student-Initiated Action Advising via Advice Novelty

论文作者

Ilhan, Ercument, Gow, Jeremy, Perez-Liebana, Diego

论文摘要

咨询行动是一种预算受限的知识交换机制，可以帮助解决深度强化学习（RL）中的探索和采样效率低下问题。最近，利用国家新颖性和不确定性估计的学生发起的技术获得了令人鼓舞的结果。但是，基于这些估计的方法具有一些潜在的弱点。首先，他们认为学生的RL模型的融合意味着更少的建议需求。在老师缺席的情况下，这可能会产生误导，因为学生可能会自行学习次优。但稍后也忽略老师的帮助。其次，遇到状态与在经验重播动态的存在下使它们在RL模型更新中生效之间的延迟导致学生实际需要建议的反馈滞后。我们提出了一种通过随机网络蒸馏（RND）来衡量建议的新颖性来减轻学生发起的算法。此外，我们仅针对被建议的各州进行RND更新，以确保学生的学习不会损害其利用教师的能力。 Gridworld和Minatar的实验表明，我们的方法与最先进的方法相同，并且在现有方法容易失败的情况下显示出显着的优势。

Action advising is a budget-constrained knowledge exchange mechanism between teacher-student peers that can help tackle exploration and sample inefficiency problems in deep reinforcement learning (RL). Most recently, student-initiated techniques that utilise state novelty and uncertainty estimations have obtained promising results. However, the approaches built on these estimations have some potential weaknesses. First, they assume that the convergence of the student's RL model implies less need for advice. This can be misleading in scenarios with teacher absence early on where the student is likely to learn suboptimally by itself; yet also ignore the teacher's assistance later. Secondly, the delays between encountering states and having them to take effect in the RL model updates in presence of the experience replay dynamics cause a feedback lag in what the student actually needs advice for. We propose a student-initiated algorithm that alleviates these by employing Random Network Distillation (RND) to measure the novelty of a piece of advice. Furthermore, we perform RND updates only for the advised states to ensure that the student's own learning does not impair its ability to leverage the teacher. Experiments in GridWorld and MinAtar show that our approach performs on par with the state-of-the-art and demonstrates significant advantages in the scenarios where the existing methods are prone to fail.

下载PDF全文

下载文献需遵守相关版权规定

论文标题