论文标题
探索与剥削的行为模型:理论框架和实验证据
A Behavioral Model for Exploration vs. Exploitation: Theoretical Framework and Experimental Evidence
论文作者
论文摘要
人们如何在不知名的奖励中重复选择探索探索探索(EE)折衷?我们通过多军匪徒问题的镜头研究这个问题,并引入了一种新型的行为模型,量化选择,并自适应减少探索(QCARE)。它概括了汤普森的抽样,允许采用一种原则的方法来量化EE权衡并反映人类的决策模式。随着信息的积累,该模型会自适应地降低探索,而降低率是量化EE权衡动态的参数。我们从理论上分析了降低率如何影响决策质量,阐明了``过度探索''和``探索''的影响。从经验上讲,我们通过从人参与者那里收集行为数据来验证QCARE。 QCare不仅捕获了EE权衡中的关键行为模式,而且还优于预测能力中的替代模型。我们的分析揭示了过度探索的行为趋势。
How do people navigate the exploration-exploitation (EE) trade-off when making repeated choices with unknown rewards? We study this question through the lens of multi-armed bandit problems and introduce a novel behavioral model, Quantal Choice with Adaptive Reduction of Exploration (QCARE). It generalizes Thompson Sampling, allowing for a principled way to quantify the EE trade-off and reflect human decision-making patterns. The model adaptively reduces exploration as information accumulates, with the reduction rate serving as a parameter to quantify the EE trade-off dynamics. We theoretically analyze how varying reduction rates influence decision quality, shedding light on the effects of ``over-exploration'' and ``under-exploration.'' Empirically, we validate QCARE through experiments collecting behavioral data from human participants. QCARE not only captures critical behavioral patterns in the EE trade-off but also outperforms alternative models in predictive power. Our analysis reveals a behavioral tendency toward over-exploration.