深W-NETWORKS：解决深度强化学习的多目标优化问题

论文标题

深W-NETWORKS：解决深度强化学习的多目标优化问题

Deep W-Networks: Solving Multi-Objective Optimisation Problems With Deep Reinforcement Learning

论文作者

Hribar, Jernej, Hackett, Luke, Dusparic, Ivana

论文摘要

在本文中，我们建立在深Q-NETWORKS（DQN）方法中提出的进步，以扩展多目标表格增强学习（RL）算法W-LEARGERNING到大型状态空间。 W学习算法自然可以解决多目标环境中多个单个策略之间的竞争。但是，表格版本不能很好地扩展到具有较大状态空间的环境。为了解决此问题，我们用DQN替换了基础Q-表，并提出了添加W-Networks的添加，以替代表格重量（W）表示。我们以两种广泛认可的多目标RL基准评估了所得的深W-NETWORKS（DWN）方法：深海宝藏和多目标山车。我们表明，DWN解决了多个策略之间的竞争，同时以DQN解决方案的形式优于基线。此外，我们证明所提出的算法可以在两个测试的环境中找到帕累托前沿。

In this paper, we build on advances introduced by the Deep Q-Networks (DQN) approach to extend the multi-objective tabular Reinforcement Learning (RL) algorithm W-learning to large state spaces. W-learning algorithm can naturally solve the competition between multiple single policies in multi-objective environments. However, the tabular version does not scale well to environments with large state spaces. To address this issue, we replace underlying Q-tables with DQN, and propose an addition of W-Networks, as a replacement for tabular weights (W) representations. We evaluate the resulting Deep W-Networks (DWN) approach in two widely-accepted multi-objective RL benchmarks: deep sea treasure and multi-objective mountain car. We show that DWN solves the competition between multiple policies while outperforming the baseline in the form of a DQN solution. Additionally, we demonstrate that the proposed algorithm can find the Pareto front in both tested environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题