迭代囚犯困境的预测策略

论文标题

迭代囚犯困境的预测策略

A Predictive Strategy for the Iterated Prisoner's Dilemma

论文作者

Prentner, Robert

论文摘要

迭代的囚犯的困境是一款基于非常简单的基本规则，在社交环境中产生许多违反直觉和复杂的行为。它表明，即使在竞争激烈的世界中，合作也可能是一件好事，个人的健身不需要成为成功的最重要标准，并且在直接对抗中，某些策略非常强大，但平均表现不佳或进化上不稳定。在这项贡献中，我们提出了一种策略 - 预测指标 - 它似乎是“有意识的”，并且在与某些策略抗衡时选择合作，但是在与他人比赛时存在缺陷，而无需为对手或涉及的决策机制记录“标签”。为了能够以迭代囚犯的困境为基础，在高度的文字环境中运作，预测者从其经验中学到了通过对对手建模并预测（虚构的）未来来选择最佳行动。结果表明，预测变量是扮演迭代囚犯困境并且易于实施的有效策略。在模拟和代表性的比赛中，它取得了高平均得分，并赢得了各种参数设置的比赛。预测因子因此依靠简短的探索阶段来改善其模型，并且可以从本质上自私的行为中发展道德。

The iterated prisoner's dilemma is a game that produces many counter-intuitive and complex behaviors in a social environment, based on very simple basic rules. It illustrates that cooperation can be a good thing even in a competitive world, that individual fitness needs not to be the most important criteria of success, and that some strategies are very strong in a direct confrontation but could still perform poorly on average or are evolutionarily unstable. In this contribution, we present a strategy -- PREDICTOR -- which appears to be "sentient" and chooses to cooperate when playing against some strategies, but defects when playing against others, without the need to record "tags" for its opponents or an involved decision-making mechanism. To be able to operate in the highly-contextual environment, as modeled by the iterated prisoner's dilemma, PREDICTOR learns from its experience to choose optimal actions by modeling its opponent and predicting a (fictive) future. It is shown that PREDICTOR is an efficient strategy for playing the iterated prisoner's dilemma and is simple to implement. In a simulated and representative tournament, it achieves high average scores and wins the tournament for various parameter settings. PREDICTOR thereby relies on a brief phase of exploration to improve its model, and it can evolve morality from intrinsically selfish behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题