论文标题
带有遗传汤普森采样的进化多军匪
Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling
论文作者
论文摘要
作为机器学习的两所流行学校,在线学习和进化计算已成为现实世界中的决策引擎背后的两个重要动力,用于生物医学,经济学和工程领域的应用。尽管先前的工作利用匪徒改善进化算法的优化过程,但它仍然是空白的进化方法可以帮助改善在线学习剂(例如多臂匪徒)的顺序决策任务的空白领域。在这项工作中,我们提出了遗传汤普森采样,这是一种强盗算法,可保留代理商,并使用诸如精英选择,交叉和突变之类的遗传原理更新它们。多臂匪徒模拟环境和实际流行病控制问题中的经验结果表明,通过将遗传算法纳入匪徒算法,我们的方法在非平稳设置中的基础明显优于基本线。最后,我们介绍了基于Web的交互式可视化Evobandit,可指导读者完成整个学习过程,并即时进行轻量级评估。我们希望通过这项研究使研究人员参与这一不断增长的研究领域。
As two popular schools of machine learning, online learning and evolutionary computations have become two important driving forces behind real-world decision making engines for applications in biomedicine, economics, and engineering fields. Although there are prior work that utilizes bandits to improve evolutionary algorithms' optimization process, it remains a field of blank on how evolutionary approach can help improve the sequential decision making tasks of online learning agents such as the multi-armed bandits. In this work, we propose the Genetic Thompson Sampling, a bandit algorithm that keeps a population of agents and update them with genetic principles such as elite selection, crossover and mutations. Empirical results in multi-armed bandit simulation environments and a practical epidemic control problem suggest that by incorporating the genetic algorithm into the bandit algorithm, our method significantly outperforms the baselines in nonstationary settings. Lastly, we introduce EvoBandit, a web-based interactive visualization to guide the readers through the entire learning process and perform lightweight evaluations on the fly. We hope to engage researchers into this growing field of research with this investigation.