通过深度加强学习的自动驾驶，事件触发的模型预测控制

论文标题

通过深度加强学习的自动驾驶，事件触发的模型预测控制

Event-Triggered Model Predictive Control with Deep Reinforcement Learning for Autonomous Driving

论文作者

Dang, Fengying, Chen, Dong, Chen, Jun, Li, Zhaojian

论文摘要

事件触发的模型预测控制（EMPC）是一种流行的最佳控制方法，旨在减轻MPC的计算和/或通信负担。但是，通常需要先验了解闭环系统行为以及设计事件触发策略的通信特征。本文试图通过提出有效的EMPC框架来解决这一挑战，并在随后的自动驾驶汽车路径上成功实施了此框架。首先，使用无模型的增强学习（RL）代理人用于学习最佳的事件触发策略，而无需在此框架中具有完整的动态系统和通信知识。此外，采用包括优先经验重播（PER）缓冲区和长期术语记忆（LSTM）的技术来促进探索并提高训练效率。在本文中，我们使用提出的三种深度RL算法的拟议框架，即双Q学习（DDQN），近端策略优化（PPO）和软参与者 - 批评（SAC），以解决此问题。实验结果表明，所有三个基于RL的EMPC（DEEP-RL-EMPC）都比在自动路径下的常规阈值和以前的基于线性Q的方法获得更好的评估性能。特别是，具有LSTM和DDQN-EMPC的PPO-EMPC和PER和LSTM在闭环控制性能和事件触发频率之间获得了较高的平衡。关联的代码是开源的，可在以下网址提供：https：//github.com/dangfengying/rl基础基础 - event-triggered-mpc。

Event-triggered model predictive control (eMPC) is a popular optimal control method with an aim to alleviate the computation and/or communication burden of MPC. However, it generally requires priori knowledge of the closed-loop system behavior along with the communication characteristics for designing the event-trigger policy. This paper attempts to solve this challenge by proposing an efficient eMPC framework and demonstrate successful implementation of this framework on the autonomous vehicle path following. First of all, a model-free reinforcement learning (RL) agent is used to learn the optimal event-trigger policy without the need for a complete dynamical system and communication knowledge in this framework. Furthermore, techniques including prioritized experience replay (PER) buffer and long-short term memory (LSTM) are employed to foster exploration and improve training efficiency. In this paper, we use the proposed framework with three deep RL algorithms, i.e., Double Q-learning (DDQN), Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC), to solve this problem. Experimental results show that all three deep RL-based eMPC (deep-RL-eMPC) can achieve better evaluation performance than the conventional threshold-based and previous linear Q-based approach in the autonomous path following. In particular, PPO-eMPC with LSTM and DDQN-eMPC with PER and LSTM obtains a superior balance between the closed-loop control performance and event-trigger frequency. The associated code is open-sourced and available at: https://github.com/DangFengying/RL-based-event-triggered-MPC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题