论文标题
反对武装的匪徒反对任意策略
Adversarial Bandits against Arbitrary Strategies
论文作者
论文摘要
我们研究了针对任意策略的对抗性匪徒问题,其中$ s $是问题硬度的参数,并且该参数不给代理。为了解决这个问题,我们使用在线镜像下降方法(OMD)采用主基准框架。我们首先提供了一种具有简单OMD的主基准算法,从而实现$ \ tilde {o}(s^{1/2} k^{1/3} t^{2/3})$,其中$ t^{2/3} $来自损失估计器的差异。为了减轻差异的影响,我们建议使用OMD的自适应学习率,并实现$ \ tilde {o}(\ min \ {\ Mathbb {e}损失估计器的差异术语。
We study the adversarial bandit problem against arbitrary strategies, in which $S$ is the parameter for the hardness of the problem and this parameter is not given to the agent. To handle this problem, we adopt the master-base framework using the online mirror descent method (OMD). We first provide a master-base algorithm with simple OMD, achieving $\tilde{O}(S^{1/2}K^{1/3}T^{2/3})$, in which $T^{2/3}$ comes from the variance of loss estimators. To mitigate the impact of the variance, we propose using adaptive learning rates for OMD and achieve $\tilde{O}(\min\{\mathbb{E}[\sqrt{SKTρ_T(h^\dagger)}],S\sqrt{KT}\})$, where $ρ_T(h^\dagger)$ is a variance term for loss estimators.