论文标题

恢复RL:通过学习的恢复区的安全加固学习

Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones

论文作者

Thananjeyan, Brijen, Balakrishna, Ashwin, Nair, Suraj, Luo, Michael, Srinivasan, Krishnan, Hwang, Minho, Gonzalez, Joseph E., Ibarz, Julian, Finn, Chelsea, Goldberg, Ken

论文摘要

安全仍然是一个核心障碍,以防止在现实世界中广泛使用RL:在不确定的环境中学习新任务需要广泛的探索,但是安全需要限制探索。我们提出了恢复RL,这是一种算法,该算法通过(1)利用离线数据来了解违反策略学习之前违反约束区域的违反区域,以及(2)分开提高任务绩效和约束满意度的目标:一种任务策略:一种仅优化任务奖励的任务奖励和一种可能违反违规行为的任务奖励的恢复策略。我们评估了6个模拟域上的恢复RL,包括两个接触式操纵任务和一个基于图像的导航任务,以及一个基于图像的障碍物避免障碍物。我们将恢复RL与5个先前的安全RL方法进行比较,该方法通过约束优化或奖励成型共同优化任务性能和安全性,并发现恢复RL的表现优于所有域中的下一个最佳先验方法。结果表明,恢复RL在模拟域中的效率较高2-20倍,而在物理实验中,恢复RL的效率更高2-20倍。请参阅https://tinyurl.com/rl-recovery ht-Recovery,以获取视频和补充材料。

Safety remains a central obstacle preventing widespread use of RL in the real world: learning new tasks in uncertain environments requires extensive exploration, but safety requires limiting exploration. We propose Recovery RL, an algorithm which navigates this tradeoff by (1) leveraging offline data to learn about constraint violating zones before policy learning and (2) separating the goals of improving task performance and constraint satisfaction across two policies: a task policy that only optimizes the task reward and a recovery policy that guides the agent to safety when constraint violation is likely. We evaluate Recovery RL on 6 simulation domains, including two contact-rich manipulation tasks and an image-based navigation task, and an image-based obstacle avoidance task on a physical robot. We compare Recovery RL to 5 prior safe RL methods which jointly optimize for task performance and safety via constrained optimization or reward shaping and find that Recovery RL outperforms the next best prior method across all domains. Results suggest that Recovery RL trades off constraint violations and task successes 2 - 20 times more efficiently in simulation domains and 3 times more efficiently in physical experiments. See https://tinyurl.com/rl-recovery for videos and supplementary material.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源