跟踪深度强化学习与模仿学习之间的比赛 - 扩展版本

论文标题

跟踪深度强化学习与模仿学习之间的比赛 - 扩展版本

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning -- Extended Version

论文作者

Gros, Timo P., Höller, Daniel, Hoffmann, Jörg, Wolf, Verena

论文摘要

近年来，基于学习的解决大型顺序决策问题的方法已变得流行。最终的代理人的性能不同，其特征取决于基础学习方法的特征。在这里，我们考虑了从强化学习领域，赛车场的基准计划问题，以研究从不同深层（强化）学习方法中得出的代理的特性。我们将深入监督学习的表现，尤其是模仿学习，以增强赛道模型的增强学习。我们发现模仿学习会产生遵循更风险的途径的代理。相比之下，深度强化学习的决定更具远见性，即避免更有可能致命决定的状态。我们的评估表明，对于这个顺序决策问题，即使考虑模仿学习最佳决策，深度强化学习在许多方面都表现最好。

Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we consider a benchmark planning problem from the reinforcement learning domain, the Racetrack, to investigate the properties of agents derived from different deep (reinforcement) learning approaches. We compare the performance of deep supervised learning, in particular imitation learning, to reinforcement learning for the Racetrack model. We find that imitation learning yields agents that follow more risky paths. In contrast, the decisions of deep reinforcement learning are more foresighted, i.e., avoid states in which fatal decisions are more likely. Our evaluations show that for this sequential decision making problem, deep reinforcement learning performs best in many aspects even though for imitation learning optimal decisions are considered.

下载PDF全文

下载文献需遵守相关版权规定

论文标题