梦想：通过优势基线和无模型学习的深刻遗憾最小化

论文标题

梦想：通过优势基线和无模型学习的深刻遗憾最小化

DREAM: Deep Regret minimization with Advantage baselines and Model-free learning

论文作者

Steinberger, Eric, Lerer, Adam, Brown, Noam

论文摘要

我们介绍了Dream，这是一种深厚的强化学习算法，在与多个代理商的不完美信息游戏中找到了最佳策略。正式地，Dream会在两人零和游戏中融合NASH平衡，并在所有其他游戏中都具有广泛的粗糙相关平衡。我们的主要创新是一种有效的算法，与其他基于遗憾的深度学习算法相比，它不需要访问游戏的完美模拟器来实现良好的性能。我们表明，梦想从经验上实现了流行基准游戏中无模型算法的最先进性能，甚至与确实使用完美模拟器的算法竞争。

We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash Equilibrium in two-player zero-sum games and to an extensive-form coarse correlated equilibrium in all other games. Our primary innovation is an effective algorithm that, in contrast to other regret-based deep learning algorithms, does not require access to a perfect simulator of the game to achieve good performance. We show that DREAM empirically achieves state-of-the-art performance among model-free algorithms in popular benchmark games, and is even competitive with algorithms that do use a perfect simulator.

下载PDF全文

下载文献需遵守相关版权规定

论文标题