验证端到端增强学习的安全探索

论文标题

验证端到端增强学习的安全探索

Verifiably Safe Exploration for End-to-End Reinforcement Learning

论文作者

Hunt, Nathan, Fulton, Nathan, Magliacane, Sara, Hoang, Nghia, Das, Subhro, Solar-Lezama, Armando

论文摘要

在安全至关重要的环境中部署深厚的强化学习需要开发在探索过程中遵守硬性约束的算法。本文为通过视觉投入的端到端政策实施正式的安全限制提供了第一种方法。我们的方法取决于混合动力学系统的对象检测和自动推理的最新进展。该方法是根据一种新颖的基准进行评估的，该基准强调了在存在硬约束的情况下安全探索的挑战。我们的基准是从一些提出的安全学习的问题集中获取的，其中包括强调挑战的问题，例如奖励信号与安全限制不符。在这些基准问题中的每一个问题上，我们的算法完全避免了不安全的行为，同时保持竞争力优化，以优化安全的奖励。我们还证明，我们执行安全限制的方法可以从原始环境中保留所有安全的政策。

Deploying deep reinforcement learning in safety-critical settings requires developing algorithms that obey hard constraints during exploration. This paper contributes a first approach toward enforcing formal safety constraints on end-to-end policies with visual inputs. Our approach draws on recent advances in object detection and automated reasoning for hybrid dynamical systems. The approach is evaluated on a novel benchmark that emphasizes the challenge of safely exploring in the presence of hard constraints. Our benchmark draws from several proposed problem sets for safe learning and includes problems that emphasize challenges such as reward signals that are not aligned with safety constraints. On each of these benchmark problems, our algorithm completely avoids unsafe behavior while remaining competitive at optimizing for as much reward as is safe. We also prove that our method of enforcing the safety constraints preserves all safe policies from the original environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题