马尔可夫跳跃线性系统的策略优化方法的收敛保证

论文标题

马尔可夫跳跃线性系统的策略优化方法的收敛保证

Convergence Guarantees of Policy Optimization Methods for Markovian Jump Linear Systems

论文作者

Jansch-Porto, Joao Paulo, Hu, Bin, Dullerud, Geir

论文摘要

最近，由于对强化学习的兴趣越来越多，出于控制目的的政策优化引起了人们的重新关注。在本文中，我们研究了马尔可夫跳跃线性系统（MJLS）的二次控制策略优化的收敛性。首先，我们研究了对MJL的直接策略优化的优化格局，尤其是表明，尽管产生问题的问题尚不凸出，但独特的固定点还是全球最佳解决方案。接下来，我们证明高斯 - 纽顿方法和自然策略梯度方法如果在控制器处初始化，以线性速率收敛到MJLS的最佳状态反馈控制器，该控制器以既定平方的感觉稳定闭环动力学。我们提出了一个新颖的Lyapunov论点，以解决融合证明中的关键稳定性问题。最后，我们提出了一个数字示例来支持我们的理论。我们的工作为了解控制未知MJL的政策学习方法的表现带来了新的见解。

Recently, policy optimization for control purposes has received renewed attention due to the increasing interest in reinforcement learning. In this paper, we investigate the convergence of policy optimization for quadratic control of Markovian jump linear systems (MJLS). First, we study the optimization landscape of direct policy optimization for MJLS, and, in particular, show that despite the non-convexity of the resultant problem the unique stationary point is the global optimal solution. Next, we prove that the Gauss-Newton method and the natural policy gradient method converge to the optimal state feedback controller for MJLS at a linear rate if initialized at a controller which stabilizes the closed-loop dynamics in the mean square sense. We propose a novel Lyapunov argument to fix a key stability issue in the convergence proof. Finally, we present a numerical example to support our theory. Our work brings new insights for understanding the performance of policy learning methods on controlling unknown MJLS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题