数据驱动的动态多目标最佳控制：一种吸引满意的增强学习方法

论文标题

数据驱动的动态多目标最佳控制：一种吸引满意的增强学习方法

Data-driven Dynamic Multi-objective Optimal Control: An Aspiration-satisfying Reinforcement Learning Approach

论文作者

Mazouchi, Majid, Yang, Yongliang, Modares, Hamidreza

论文摘要

本文提出了一种迭代数据驱动的算法，用于求解动态多目标（MO）最佳控制问题，该问题在控制非线性连续时间系统时产生。首先表明，可以利用与每个目标相对应的哈密顿功能，以比较可接受的策略的性能。然后，使用哈密顿的知识来确保满足目标的愿望。然后提出一个令人满意的动态优化框架，以优化主要目标，同时满足其他目标的愿望。显示了与满足（足够好的）决策框架的关系。开发了基于平方的（SOS）的迭代算法来解决配制的吸气式MO优化。为了消除对系统动力学的完整知识的要求，提出了一种数据驱动的满足加强学习方法，以实时解决SOS优化问题，仅使用在时间间隔内测量的系统轨迹的信息，而无需完全了解系统动力学。最后，提供了两个模拟示例，以显示所提出的算法的有效性。

This paper presents an iterative data-driven algorithm for solving dynamic multi-objective (MO) optimal control problems arising in control of nonlinear continuous-time systems. It is first shown that the Hamiltonian functional corresponding to each objective can be leveraged to compare the performance of admissible policies. Hamiltonian-inequalities are then used for which their satisfaction guarantees satisfying the objectives' aspirations. An aspiration-satisfying dynamic optimization framework is then presented to optimize the main objective while satisfying the aspiration of other objectives. Relation to satisficing (good enough) decision-making framework is shown. A Sum-of-Square (SOS) based iterative algorithm is developed to solve the formulated aspiration-satisfying MO optimization. To obviate the requirement of complete knowledge of the system dynamics, a data-driven satisficing reinforcement learning approach is proposed to solve the SOS optimization problem in real-time using only the information of the system trajectories measured during a time interval without having full knowledge of the system dynamics. Finally, two simulation examples are provided to show the effectiveness of the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题