反对武装的匪徒反对任意策略

论文标题

反对武装的匪徒反对任意策略

Adversarial Bandits against Arbitrary Strategies

论文作者

Kim, Jung-hun, Yun, Se-Young

论文摘要

我们研究了针对任意策略的对抗性匪徒问题，其中$ s $是问题硬度的参数，并且该参数不给代理。为了解决这个问题，我们使用在线镜像下降方法（OMD）采用主基准框架。我们首先提供了一种具有简单OMD的主基准算法，从而实现$ \ tilde {o}（s^{1/2} k^{1/3} t^{2/3}）$，其中$ t^{2/3} $来自损失估计器的差异。为了减轻差异的影响，我们建议使用OMD的自适应学习率，并实现$ \ tilde {o}（\ min \ {\ Mathbb {e}损失估计器的差异术语。

We study the adversarial bandit problem against arbitrary strategies, in which $S$ is the parameter for the hardness of the problem and this parameter is not given to the agent. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\mathbb{E}[\sqrt{SKTρ_T(h^\dagger)}],S\sqrt{KT}\})$, where $ρ_T(h^\dagger)$ is a variance term for loss estimators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题