基于MPC的混合企业问题的强化学习

论文标题

基于MPC的混合企业问题的强化学习

Reinforcement Learning for Mixed-Integer Problems Based on MPC

论文作者

Gros, Sebastien, Zanon, Mario

论文摘要

最近已经提出了模型预测控制作为增强学习的政策近似，为安全可解释的增强学习提供了道路。在名义经济MPC和Rotust（N）MPC的背景下，已经研究了该方法的Q学习和参与者 - 批评方法，显示出非常有希望的结果。在这种情况下，参与者 - 批评方法似乎是最可靠的方法。许多应用包括连续和整数输入的混合物，需要对经典的参与者批评方法进行调整。在本文中，我们提出了基于混合成员MPC方案的策略近似值，并提出了一种计算廉价的技术，以在混合组的输入空间中生成探索，以确保对约束满意。然后，我们为提出的策略提出了一个简单的兼容优势函数近似值，该策略允许人们构建基于MPC的混合组MPC策略的梯度。

Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题