论文标题
$α$ -NO-NO-REGRET算法用于图形双线匪徒
An $α$-No-Regret Algorithm For Graphical Bilinear Bandits
论文作者
论文摘要
我们提出了第一种基于遗憾的方法来解决图形双线匪徒问题,其中$ n $代理与每个邻居一起玩随机的双线性强盗游戏。该设置揭示了组合NP硬化问题,可防止(Bi-)线性匪徒文献中使用任何现有基于后悔的算法。在本文中,我们填补了这一空白,并提出了第一个基于遗憾的算法,该算法是使用不确定性的乐观原则来进行图形双线性匪徒。对该新方法的理论分析产生了$ \ tilde {o}(\ sqrt {t})$的上限对$α$ - regret上的上限,并证明了图结构对收敛速率的影响。最后,我们通过各种实验表明方法的有效性。
We propose the first regret-based approach to the Graphical Bilinear Bandits problem, where $n$ agents in a graph play a stochastic bilinear bandit game with each of their neighbors. This setting reveals a combinatorial NP-hard problem that prevents the use of any existing regret-based algorithm in the (bi-)linear bandit literature. In this paper, we fill this gap and present the first regret-based algorithm for graphical bilinear bandits using the principle of optimism in the face of uncertainty. Theoretical analysis of this new method yields an upper bound of $\tilde{O}(\sqrt{T})$ on the $α$-regret and evidences the impact of the graph structure on the rate of convergence. Finally, we show through various experiments the validity of our approach.