$α$ -NO-NO-REGRET算法用于图形双线匪徒

论文标题

$α$ -NO-NO-REGRET算法用于图形双线匪徒

An $α$-No-Regret Algorithm For Graphical Bilinear Bandits

论文作者

Rizk, Geovani, Colin, Igor, Thomas, Albert, Laraki, Rida, Chevaleyre, Yann

论文摘要

我们提出了第一种基于遗憾的方法来解决图形双线匪徒问题，其中$ n $代理与每个邻居一起玩随机的双线性强盗游戏。该设置揭示了组合NP硬化问题，可防止（Bi-）线性匪徒文献中使用任何现有基于后悔的算法。在本文中，我们填补了这一空白，并提出了第一个基于遗憾的算法，该算法是使用不确定性的乐观原则来进行图形双线性匪徒。对该新方法的理论分析产生了$ \ tilde {o}（\ sqrt {t}）$的上限对$α$ - regret上的上限，并证明了图结构对收敛速率的影响。最后，我们通过各种实验表明方法的有效性。

We propose the first regret-based approach to the Graphical Bilinear Bandits problem, where $n$ agents in a graph play a stochastic bilinear bandit game with each of their neighbors. This setting reveals a combinatorial NP-hard problem that prevents the use of any existing regret-based algorithm in the (bi-)linear bandit literature. In this paper, we fill this gap and present the first regret-based algorithm for graphical bilinear bandits using the principle of optimism in the face of uncertainty. Theoretical analysis of this new method yields an upper bound of $\tilde{O}(\sqrt{T})$ on the $α$-regret and evidences the impact of the graph structure on the rate of convergence. Finally, we show through various experiments the validity of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题