随机线性匪徒中的多种环境元学习

论文标题

随机线性匪徒中的多种环境元学习

Multi-Environment Meta-Learning in Stochastic Linear Bandits

论文作者

Moradipari, Ahmadreza, Ghavamzadeh, Mohammad, Rajabzadeh, Taha, Thrampoulidis, Christos, Alizadeh, Mahnoosh

论文摘要

在这项工作中，我们研究了可以源自多种环境的多任务线性随机匪徒问题中的元学习（或学习与学习）方法。受[1]对元学习的工作的启发，其参数是从单个分布中采样（即单个环境）的一系列线性匪徒问题的启发，在这里，我们考虑从混合物分布中绘制任务参数时元学习的可行性。对于这个问题，我们提出了OFUL算法的正规版本，当对具有标记环境的任务进行培训时，在新任务上却遗憾的是，而无需了解新任务启动的环境。具体而言，我们为新算法的遗憾捕获了环境错误分类的效果，并突出了分别学习每个任务的好处，或者在不认识不同混合组件的情况下学习每个任务。

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new algorithm captures the effect of environment misclassification and highlights the benefits over learning each task separately or meta-learning without recognition of the distinct mixture components.

下载PDF全文

下载文献需遵守相关版权规定

论文标题