论文标题
使用多军强盗和秘书扩展重复的消费者搜索
Expanding on Repeated Consumer Search Using Multi-Armed Bandits and Secretaries
论文作者
论文摘要
我们试图采取不同的方法来推导Fishman and Rob(1995)中的重复消费者搜索模型的最佳搜索策略,其主要动机是在每个时期内丢弃价格分布的先验知识$ f(p)$的主要动机。我们将通过结合著名的多军强盗问题(MAB)来做到这一点。我们首先修改MAB框架,以适合重复的消费者搜索模型的设置,并将目标作为动态优化问题。然后,给定任何探索序列,我们使用Bellman方程为每个商店分配一个值。然后,我们开始将问题分解为每个时期的最佳停止问题,这与著名秘书问题的框架相吻合,我们继续得出最佳的停止政策。我们将看到,在每个时期内实施最佳停止策略可以通过“向前归纳”推理解决原始动态优化。
We seek to take a different approach in deriving the optimal search policy for the repeated consumer search model found in Fishman and Rob (1995) with the main motivation of dropping the assumption of prior knowledge of the price distribution $F(p)$ in each period. We will do this by incorporating the famous multi-armed bandit problem (MAB). We start by modifying the MAB framework to fit the setting of the repeated consumer search model and formulate the objective as a dynamic optimization problem. Then, given any sequence of exploration, we assign a value to each store in that sequence using Bellman equations. We then proceed to break down the problem into individual optimal stopping problems for each period which incidentally coincides with the framework of the famous secretary problem where we proceed to derive the optimal stopping policy. We will see that implementing the optimal stopping policy in each period solves the original dynamic optimization by `forward induction' reasoning.