通过系统识别低维线性模型的系统识别，基于模型的安全性和无模型的增强学习

论文标题

通过系统识别低维线性模型的系统识别，基于模型的安全性和无模型的增强学习

Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

论文作者

Li, Zhongyu, Zeng, Jun, Thirugnanam, Akshay, Sreenath, Koushil

论文摘要

动态机器人的基于桥接模型的安全性和无模型的加固学习（RL）很有吸引力，因为基于模型的方法能够提供正式的安全保证，而基于RL的方法可以通过从全阶系统动力学中学习来利用机器人敏捷性。但是，解决此问题的当前方法主要限于简单系统。在本文中，我们提出了一种新方法，将基于模型的安全性与无模型增强学习结合在一起，通过明确找到由RL策略控制的系统的低维模型，并在该简单模型上应用稳定性和安全保证。我们使用复杂的双足机器人Cassie，它是一个高维非线性系统，具有混合动力学和不足，其基于RL的步行控制器为例。我们表明，低维动力模型足以捕获闭环系统的动力学。我们证明了该模型是线性的，渐近稳定的，并且在所有维度中都在控制输入之间解耦。我们进一步说明了即使使用不同的RL控制策略，这种线性也存在。这些结果指出了一个有趣的方向，可以理解RL和最佳控制之间的关系：在某些情况下，RL是否倾向于在训练过程中线性化非线性系统。此外，我们说明发现的线性模型能够通过安全至关重要的最佳控制框架（例如，使用CASSIE自主导航的示例）在利用基于RL的基于RL基于RL的控制器提供的敏捷性的示例上提供保证的保证。

Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propose a new method to combine model-based safety with model-free reinforcement learning by explicitly finding a low-dimensional model of the system controlled by a RL policy and applying stability and safety guarantees on that simple model. We use a complex bipedal robot Cassie, which is a high dimensional nonlinear system with hybrid dynamics and underactuation, and its RL-based walking controller as an example. We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system. We demonstrate that this model is linear, asymptotically stable, and is decoupled across control input in all dimensions. We further exemplify that such linearity exists even when using different RL control policies. Such results point out an interesting direction to understand the relationship between RL and optimal control: whether RL tends to linearize the nonlinear system during training in some cases. Furthermore, we illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework, e.g., Model Predictive Control with Control Barrier Functions, on an example of autonomous navigation using Cassie while taking advantage of the agility provided by the RL-based controller.

下载PDF全文

下载文献需遵守相关版权规定

论文标题