利用侧面信息以改进无线网络中的在线学习算法

论文标题

利用侧面信息以改进无线网络中的在线学习算法

Exploiting Side Information for Improved Online Learning Algorithms in Wireless Networks

论文作者

Hanawal, Manjesh K., Darak, Sumit J.

论文摘要

在无线网络中，达到的速率取决于干扰水平，硬件障碍和渠道增益等因素。通常，可以测量其中一些因素的瞬时值，并提供有关达到的瞬时速率的有用信息。例如，较高的干扰意味着较低的速率。在这项工作中，我们将任何与侧面信息相关的可测量质量不零相关，并研究如何利用它来快速学习提供更高吞吐量的渠道（奖励）。当已知侧信息的平均值时，使用控制变量理论，我们开发的算法需要更少的样本来学习参数，并且与忽略侧信息的情况相比，可以提高学习率。具体而言，我们将侧信息纳入经典的上限置信度（UCB）算法中，并量化了遗憾表现中获得的增益。我们表明，增益与奖励和相关侧信息之间的相关性数量成正比。我们详细讨论了可以在$ l $ band中以认知无线电和地面通信中利用的各种侧信息。我们证明，奖励和侧面信息之间的相关性在实践中通常很强，并且利用它可以显着改善吞吐量。

In wireless networks, the rate achieved depends on factors like level of interference, hardware impairments, and channel gain. Often, instantaneous values of some of these factors can be measured, and they provide useful information about the instantaneous rate achieved. For example, higher interference implies a lower rate. In this work, we treat any such measurable quality that has a non-zero correlation with the rate achieved as side-information and study how it can be exploited to quickly learn the channel that offers higher throughput (reward). When the mean value of the side-information is known, using control variate theory we develop algorithms that require fewer samples to learn the parameters and can improve the learning rate compared to cases where side-information is ignored. Specifically, we incorporate side-information in the classical Upper Confidence Bound (UCB) algorithm and quantify the gain achieved in the regret performance. We show that the gain is proportional to the amount of the correlation between the reward and associated side-information. We discuss in detail various side-information that can be exploited in cognitive radio and air-to-ground communication in $L-$band. We demonstrate that correlation between the reward and side-information is often strong in practice and exploiting it improves the throughput significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题