用于调整PI控制器的元强化学习：离线方法

论文标题

用于调整PI控制器的元强化学习：离线方法

Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach

论文作者

McClement, Daniel G., Lawrence, Nathan P., Backstrom, Johan U., Loewen, Philip D., Forbes, Michael G., Gopaluni, R. Bhushan

论文摘要

元学习是机器学习的一个分支，它训练神经网络模型以综合各种数据，以快速解决新问题。在过程控制中，许多系统具有相似且充分理解的动态，这表明可以通过元学习创建可推广的控制器是可行的。在这项工作中，我们制定了一种元加强学习（META-RL）控制策略，该策略可用于调整比例的整体控制器。我们的Meta-RL代理具有复发结构，该结构通过闭环中的隐藏状态变量来累积“上下文”以学习系统的动态。该体系结构使代理能够自动适应过程动力学的变化。在此处报告的测试中，对元RL代理在第一阶和时间延迟系统上完全离线训练，并从相同的训练过程动力学分布中得出的新型系统产生了出色的效果。一个关键的设计元素是在模拟环境中训练期间在训练期间离线利用基于模型的信息的能力，同时保持无模型的策略结构，以与真实过程动态不确定性的新型过程进行交互。元学习是一种构建样品有效智能控制器的有前途的方法。

Meta-learning is a branch of machine learning which trains neural network models to synthesize a wide variety of data in order to rapidly solve new problems. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that can be used to tune proportional--integral controllers. Our meta-RL agent has a recurrent structure that accumulates "context" to learn a system's dynamics through a hidden state variable in closed-loop. This architecture enables the agent to automatically adapt to changes in the process dynamics. In tests reported here, the meta-RL agent was trained entirely offline on first order plus time delay systems, and produced excellent results on novel systems drawn from the same distribution of process dynamics used for training. A key design element is the ability to leverage model-based information offline during training in simulated environments while maintaining a model-free policy structure for interacting with novel processes where there is uncertainty regarding the true process dynamics. Meta-learning is a promising approach for constructing sample-efficient intelligent controllers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题