反馈放松控制的规律性和稳定性

论文标题

反馈放松控制的规律性和稳定性

Regularity and stability of feedback relaxed controls

论文作者

Reisinger, Christoph, Zhang, Yufei

论文摘要

本文提出了一个放松的控制正则化，并具有一般探索奖励，以设计强大的反馈控制，以用于多维连续时间随机退出时间问题。我们确定正规控制问题承认了Hölder连续反馈控制，并证明了正规控制问题的值函数和反馈控制对于参数扰动而言是Lipschitz稳定的。此外，我们表明，预计的反馈松弛控制在扰动的系统中具有强大的性能，并为价值函数和最佳反馈宽松控制提供了一阶灵敏度方程。这些稳定性结果为最近的强化学习启发式方法提供了理论上的理由，这些启发式方法包括在优化目标中获得勘探奖励会导致更强大的决策。最终，我们证明了通过消失的勘探参数的放松控制问题的价值函数的一阶单调收敛，这随后使我们能够基于反馈放松的控制措施来构建原始控制问题的纯剥削策略。

This paper proposes a relaxed control regularization with general exploration rewards to design robust feedback controls for multi-dimensional continuous-time stochastic exit time problems. We establish that the regularized control problem admits a Hölder continuous feedback control, and demonstrate that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations. Moreover, we show that a pre-computed feedback relaxed control has a robust performance in a perturbed system, and derive a first-order sensitivity equation for both the value function and optimal feedback relaxed control. These stability results provide a theoretical justification for recent reinforcement learning heuristics that including an exploration reward in the optimization objective leads to more robust decision making. We finally prove first-order monotone convergence of the value functions for relaxed control problems with vanishing exploration parameters, which subsequently enables us to construct the pure exploitation strategy of the original control problem based on the feedback relaxed controls.

下载PDF全文

下载文献需遵守相关版权规定

论文标题