Markovian跳跃线性二次控制的策略优化：基于梯度的方法和全局融合

论文标题

Markovian跳跃线性二次控制的策略优化：基于梯度的方法和全局融合

Policy Optimization for Markovian Jump Linear Quadratic Control: Gradient-Based Methods and Global Convergence

论文作者

Jansch-Porto, Joao Paulo, Hu, Bin, Dullerud, Geir

论文摘要

最近，由于对强化学习的兴趣越来越多，出于控制目的的政策优化引起了人们的重新关注。在本文中，我们研究了基于梯度的策略优化方法的全局融合，用于对离散时间马尔可夫跳跃线性系统（MJL）的二次最佳控制。首先，我们研究了对MJL的直接政策优化的优化格局，并具有静态状态反馈控制器和二次绩效成本。尽管最终问题的跨性别性不存在，但我们仍然能够识别几种有用的特性，例如训练，梯度优势和几乎平滑度。基于这些属性，我们显示了三种类型的策略优化方法的全局融合：梯度下降方法；高斯 - 纽顿方法；和自然政策梯度方法。我们证明，如果在均值稳定的控制器处初始化，则所有三种方法都以线性速率收敛到MJL的最佳状态反馈控制器。提出了一些数值示例以支持该理论。这项工作带来了新见解，以了解马尔可夫跳跃线性二次控制问题的策略梯度方法的性能。

Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning. In this paper, we investigate the global convergence of gradient-based policy optimization methods for quadratic optimal control of discrete-time Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, with static state feedback controllers and quadratic performance costs. Despite the non-convexity of the resultant problem, we are still able to identify several useful properties such as coercivity, gradient dominance, and almost smoothness. Based on these properties, we show global convergence of three types of policy optimization methods: the gradient descent method; the Gauss-Newton method; and the natural policy gradient method. We prove that all three methods converge to the optimal state feedback controller for MJLS at a linear rate if initialized at a controller which is mean-square stabilizing. Some numerical examples are presented to support the theory. This work brings new insights for understanding the performance of policy gradient methods on the Markovian jump linear quadratic control problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题