通过顶点网络对控制膜系统的安全加强学习

论文标题

通过顶点网络对控制膜系统的安全加强学习

Safe Reinforcement Learning of Control-Affine Systems with Vertex Networks

论文作者

Zheng, Liyuan, Shi, Yuanyuan, Ratliff, Lillian J., Zhang, Baosen

论文摘要

本文着重于为具有坚硬状态和行动限制的控制系统找到强化学习政策。尽管在许多领域中取得了成功，但强化学习仍具有挑战性地应用于严格限制的问题，尤其是在状态变量和动作都受到限制的情况下。以前寻求确保限制满意度或安全性的工作重点是将预测步骤添加到学习的政策中。但是，这种方法需要在每个政策执行步骤中解决优化问题，这可能会导致大量的计算成本。为了解决这个问题，本文提出了一种新方法，称为Vertex Networks（VNS），并通过将安全限制纳入策略网络体系结构中，并保证了探索期间的安全性和学习的控制策略。利用凸集内所有点的几何特性可以表示为其顶点的凸组合，提出的算法首先了解凸的组合权重，然后将这些权重以及预算的顶点以及预算的顶点使用以输出动作。通过施工确保输出动作是安全的。数值示例说明，在各种基准控制任务中，所提出的VN算法优于香草增强学习。

This paper focuses on finding reinforcement learning policies for control systems with hard state and action constraints. Despite its success in many domains, reinforcement learning is challenging to apply to problems with hard constraints, especially if both the state variables and actions are constrained. Previous works seeking to ensure constraint satisfaction, or safety, have focused on adding a projection step to a learned policy. Yet, this approach requires solving an optimization problem at every policy execution step, which can lead to significant computational costs. To tackle this problem, this paper proposes a new approach, termed Vertex Networks (VNs), with guarantees on safety during exploration and on learned control policies by incorporating the safety constraints into the policy network architecture. Leveraging the geometric property that all points within a convex set can be represented as the convex combination of its vertices, the proposed algorithm first learns the convex combination weights and then uses these weights along with the pre-calculated vertices to output an action. The output action is guaranteed to be safe by construction. Numerical examples illustrate that the proposed VN algorithm outperforms vanilla reinforcement learning in a variety of benchmark control tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题