论文标题
词典多目标增强学习
Lexicographic Multi-Objective Reinforcement Learning
论文作者
论文摘要
在这项工作中,我们介绍了解决词典学多目标问题的强化学习技术。这些问题涉及多个奖励信号,目标是学习最大化第一个奖励信号的政策,并受到此约束的影响,还可以最大化第二个奖励信号,依此类推。我们提出了一个可以用来解决此类问题的行动价值和政策梯度算法的家庭,并证明它们会融合到词典上最佳的政策。我们从经验上评估了这些算法的可伸缩性和性能,证明了它们的实际适用性。作为一个更具体的应用程序,我们展示了如何使用算法对代理行为施加安全限制,并将其在这种情况下的性能与其他受约束的增强学习算法进行比较。
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, demonstrating their practical applicability. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.