论文标题

基于MPC的混合企业问题的强化学习

Reinforcement Learning for Mixed-Integer Problems Based on MPC

论文作者

Gros, Sebastien, Zanon, Mario

论文摘要

最近已经提出了模型预测控制作为增强学习的政策近似,为安全可解释的增强学习提供了道路。在名义经济MPC和Rotust(N)MPC的背景下,已经研究了该方法的Q学习和参与者 - 批评方法,显示出非常有希望的结果。在这种情况下,参与者 - 批评方法似乎是最可靠的方法。许多应用包括连续和整数输入的混合物,需要对经典的参与者批评方法进行调整。在本文中,我们提出了基于混合成员MPC方案的策略近似值,并提出了一种计算廉价的技术,以在混合组的输入空间中生成探索,以确保对约束满意。然后,我们为提出的策略提出了一个简单的兼容优势函数近似值,该策略允许人们构建基于MPC的混合组MPC策略的梯度。

Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源