论文标题
求解零和单侧的部分可观察的随机游戏
Solving Zero-Sum One-Sided Partially Observable Stochastic Games
论文作者
论文摘要
许多安全性和其他现实世界的情况本质上是动态的,可以建模为严格竞争性(或零和零)动态游戏。在这些领域中,代理人会采取行动来影响环境,并就对手行动的情况和影响接收观察结果(可能是不完美的)。此外,代理可以执行的操作总数没有限制 - 也就是说,没有固定的视野。这些设置可以建模为可观察到的随机游戏(POSG)。但是,解决一般POSG在计算上是棘手的,因此我们专注于一个称为单方面POSG的POSG的广泛子类。在这些游戏中,只有一个代理商的对手对当前情况有充分的了解。我们提供了一个用于求解单方面POSG的完整图:我们(1)对单方面POSG及其价值功能进行了理论分析,(2)表明,值意式算法的变体在这种情况下会收敛,(3)适应启发式搜索值练习算法,以求解单方面的策略,以描述5个策略(4)使用该功能(4),(4)如何实现(4)范围(4),(4)在这种情况下(4)效果(4)效果(4),(4)在这种情况下(4)效果(4)范围(4)效果(4)证明我们的算法可以解决非平凡大小的单方面POSG,并在三个不同的领域中分析我们的算法的可扩展性:追捕,巡逻和搜索游戏。
Many security and other real-world situations are dynamic in nature and can be modelled as strictly competitive (or zero-sum) dynamic games. In these domains, agents perform actions to affect the environment and receive observations -- possibly imperfect -- about the situation and the effects of the opponent's actions. Moreover, there is no limitation on the total number of actions an agent can perform -- that is, there is no fixed horizon. These settings can be modelled as partially observable stochastic games (POSGs). However, solving general POSGs is computationally intractable, so we focus on a broad subclass of POSGs called one-sided POSGs. In these games, only one agent has imperfect information while their opponent has full knowledge of the current situation. We provide a full picture for solving one-sided POSGs: we (1) give a theoretical analysis of one-sided POSGs and their value functions, (2) show that a variant of a value-iteration algorithm converges in this setting, (3) adapt the heuristic search value-iteration algorithm for solving one-sided POSGs, (4) describe how to use approximate value functions to derive strategies in the game, and (5) demonstrate that our algorithm can solve one-sided POSGs of non-trivial sizes and analyze the scalability of our algorithm in three different domains: pursuit-evasion, patrolling, and search games.