概率控制和最佳控制的大量化

论文标题

概率控制和最佳控制的大量化

Probabilistic Control and Majorization of Optimal Control

论文作者

Lefebvre, Tom

论文摘要

概率控制设计的建立是基于以下原则：理性代理试图以任意所需的闭环系统轨迹密度进行建模。该框架最初是作为传统最佳控制设计的一种可拖动替代方案，通过虚拟的过渡和策略密度来参数性行为，并将信息投影用作接近度度量。在这项工作中，我们介绍了所需的闭环行为的替代参数化，并探索密度之间的替代接近度度量。然后说明相关的概率控制问题如何解决不确定或概率政策。我们的主要结果是表明概率控制目标主要将常规，随机和风险敏感的最佳控制目标授予。该观察结果使我们能够识别两个概率的固定点迭代，这些迭代将收敛到确定性的最佳控制策略，以建立两种公式之间的明确连接。此外，我们证明了风险敏感的最佳控制公式在技术上也等同于概率图模型上的最大似然估计问题，其中成本的概念直接被直接编码到模型中。然后，显示估计问题的相关处理与投影概率控制公式的时刻相吻合。这样，最佳决策可以被重新制定为迭代推断问题。基于这些见解，我们讨论了算法开发的方向。

Probabilistic control design is founded on the principle that a rational agent attempts to match modelled with an arbitrary desired closed-loop system trajectory density. The framework was originally proposed as a tractable alternative to traditional optimal control design, parametrizing desired behaviour through fictitious transition and policy densities and using the information projection as a proximity measure. In this work we introduce an alternative parametrization of desired closed-loop behaviour and explore alternative proximity measures between densities. It is then illustrated how the associated probabilistic control problems solve into uncertain or probabilistic policies. Our main result is to show that the probabilistic control objectives majorize conventional, stochastic and risk sensitive, optimal control objectives. This observation allows us to identify two probabilistic fixed point iterations that converge to the deterministic optimal control policies establishing an explicit connection between either formulations. Further we demonstrate that the risk sensitive optimal control formulation is also technically equivalent to a Maximum Likelihood estimation problem on a probabilistic graph model where the notion of costs is directly encoded into the model. The associated treatment of the estimation problem is then shown to coincide with the moment projected probabilistic control formulation. That way optimal decision making can be reformulated as an iterative inference problem. Based on these insights we discuss directions for algorithmic development.

下载PDF全文

下载文献需遵守相关版权规定

论文标题