探索与剥削的行为模型：理论框架和实验证据

论文标题

探索与剥削的行为模型：理论框架和实验证据

A Behavioral Model for Exploration vs. Exploitation: Theoretical Framework and Experimental Evidence

论文作者

Ding, Jingying, Feng, Yifan, Rong, Ying

论文摘要

人们如何在不知名的奖励中重复选择探索探索探索（EE）折衷？我们通过多军匪徒问题的镜头研究这个问题，并引入了一种新型的行为模型，量化选择，并自适应减少探索（QCARE）。它概括了汤普森的抽样，允许采用一种原则的方法来量化EE权衡并反映人类的决策模式。随着信息的积累，该模型会自适应地降低探索，而降低率是量化EE权衡动态的参数。我们从理论上分析了降低率如何影响决策质量，阐明了``过度探索''和``探索''的影响。从经验上讲，我们通过从人参与者那里收集行为数据来验证QCARE。 QCare不仅捕获了EE权衡中的关键行为模式，而且还优于预测能力中的替代模型。我们的分析揭示了过度探索的行为趋势。

How do people navigate the exploration-exploitation (EE) trade-off when making repeated choices with unknown rewards? We study this question through the lens of multi-armed bandit problems and introduce a novel behavioral model, Quantal Choice with Adaptive Reduction of Exploration (QCARE). It generalizes Thompson Sampling, allowing for a principled way to quantify the EE trade-off and reflect human decision-making patterns. The model adaptively reduces exploration as information accumulates, with the reduction rate serving as a parameter to quantify the EE trade-off dynamics. We theoretically analyze how varying reduction rates influence decision quality, shedding light on the effects of ``over-exploration'' and ``under-exploration.'' Empirically, we validate QCARE through experiments collecting behavioral data from human participants. QCARE not only captures critical behavioral patterns in the EE trade-off but also outperforms alternative models in predictive power. Our analysis reveals a behavioral tendency toward over-exploration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题