使用深度加固学习自动化代号间调查器

论文标题

使用深度加固学习自动化代号间调查器

Towards automating Codenames spymasters with deep reinforcement learning

论文作者

Siu, Sherman

论文摘要

尽管大多数强化学习研究都集中在竞争性游戏上，但将其应用于合作多人游戏或基于文本的游戏方面几乎没有完成。代号是一款棋盘游戏，涉及不对称的合作和自然语言处理，这使其成为推进RL研究的绝佳候选人。据我所知，这项工作是第一个将代号作为马尔可夫决策过程制定的，并将一些知名的强化学习算法（例如SAC，PPO和A2C）应用于环境中。尽管上述算法都没有针对代号环境收敛，但除非板尺寸很小，否则它们也不会收敛于称为clickpixel的简化环境。

Although most reinforcement learning research has centered on competitive games, little work has been done on applying it to co-operative multiplayer games or text-based games. Codenames is a board game that involves both asymmetric co-operation and natural language processing, which makes it an excellent candidate for advancing RL research. To my knowledge, this work is the first to formulate Codenames as a Markov Decision Process and apply some well-known reinforcement learning algorithms such as SAC, PPO, and A2C to the environment. Although none of the above algorithms converge for the Codenames environment, neither do they converge for a simplified environment called ClickPixel, except when the board size is small.

下载PDF全文

下载文献需遵守相关版权规定

论文标题