论文标题
偶然受限的控制效果图深钢筋学习
Chance-Constrained Control with Lexicographic Deep Reinforcement Learning
论文作者
论文摘要
本文提出了一种基于偶然受限的马尔可夫决策过程的词典学深钢筋学习(DEEPRL)的方法,在该方法中,控制器试图确保满足约束的可能性高于给定的阈值。标准DEEPRL方法需要i)以多目标方式将约束作为成本函数中的其他加权项包括在内,ii)根据概率阈值,在深神经网络(DNN)的训练阶段对引入的权重进行调整。相反,所提出的方法需要单独训练一个无约束的DNN和一个与每个约束关联的DNN,然后在每个时间步长以根据系统观察到的状态选择要使用的DNN。除了标准DNN外,提出的解决方案除了概率阈值更改,也不需要任何高参数调整。还提出并通过模拟验证了众所周知的DEEPRL算法DQN的词典版本。
This paper proposes a lexicographic Deep Reinforcement Learning (DeepRL)-based approach to chance-constrained Markov Decision Processes, in which the controller seeks to ensure that the probability of satisfying the constraint is above a given threshold. Standard DeepRL approaches require i) the constraints to be included as additional weighted terms in the cost function, in a multi-objective fashion, and ii) the tuning of the introduced weights during the training phase of the Deep Neural Network (DNN) according to the probability thresholds. The proposed approach, instead, requires to separately train one constraint-free DNN and one DNN associated to each constraint and then, at each time-step, to select which DNN to use depending on the system observed state. The presented solution does not require any hyper-parameter tuning besides the standard DNN ones, even if the probability thresholds changes. A lexicographic version of the well-known DeepRL algorithm DQN is also proposed and validated via simulations.