调节自我调节性控制的生存能力信号

论文标题

调节自我调节性控制的生存能力信号

Modulation of viability signals for self-regulatory control

论文作者

Ovalle, Alvaro, Lucas, Simon M.

论文摘要

我们重新审视工具价值作为适应性行为的驱动力的作用。在主动推断中，通过一组观测值的信息理论惊奇来量化仪器或外部价值，以衡量这些观察符合先前的信念或偏好的程度。也就是说，一名代理人会寻求与自己的世界模式一致的证据类型。对于强化学习任务，偏好的分布取代了奖励的概念。我们探讨了一种以自我监督的方式学习这种分布的情况。特别是，我们强调了环境引起的观察结果与与及时剂的连续性有关的观察结果之间的区别。我们在动态环境中以离散的时间和行动评估我们的方法论。首先，以惊人的最小化模型代理（从RL意义上）最小化，然后扩展到基于模型的情况，以最大程度地减少预期的自由能。

We revisit the role of instrumental value as a driver of adaptive behavior. In active inference, instrumental or extrinsic value is quantified by the information-theoretic surprisal of a set of observations measuring the extent to which those observations conform to prior beliefs or preferences. That is, an agent is expected to seek the type of evidence that is consistent with its own model of the world. For reinforcement learning tasks, the distribution of preferences replaces the notion of reward. We explore a scenario in which the agent learns this distribution in a self-supervised manner. In particular, we highlight the distinction between observations induced by the environment and those pertaining more directly to the continuity of an agent in time. We evaluate our methodology in a dynamic environment with discrete time and actions. First with a surprisal minimizing model-free agent (in the RL sense) and then expanding to the model-based case to minimize the expected free energy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题