论文标题
使用深度加固学习自动化代号间调查器
Towards automating Codenames spymasters with deep reinforcement learning
论文作者
论文摘要
尽管大多数强化学习研究都集中在竞争性游戏上,但将其应用于合作多人游戏或基于文本的游戏方面几乎没有完成。代号是一款棋盘游戏,涉及不对称的合作和自然语言处理,这使其成为推进RL研究的绝佳候选人。据我所知,这项工作是第一个将代号作为马尔可夫决策过程制定的,并将一些知名的强化学习算法(例如SAC,PPO和A2C)应用于环境中。尽管上述算法都没有针对代号环境收敛,但除非板尺寸很小,否则它们也不会收敛于称为clickpixel的简化环境。
Although most reinforcement learning research has centered on competitive games, little work has been done on applying it to co-operative multiplayer games or text-based games. Codenames is a board game that involves both asymmetric co-operation and natural language processing, which makes it an excellent candidate for advancing RL research. To my knowledge, this work is the first to formulate Codenames as a Markov Decision Process and apply some well-known reinforcement learning algorithms such as SAC, PPO, and A2C to the environment. Although none of the above algorithms converge for the Codenames environment, neither do they converge for a simplified environment called ClickPixel, except when the board size is small.