上行链路NOMA-FURLLC网络中资源分配的可靠的强化学习

论文标题

上行链路NOMA-FURLLC网络中资源分配的可靠的强化学习

A Reliable Reinforcement Learning for Resource Allocation in Uplink NOMA-URLLC Networks

论文作者

Ahsan, Waleed, Yi, Wenqiang, Liu, Yuanwei, Nallanathan, Arumugam

论文摘要

在本文中，我们提出了一种深层的状态行动奖励状态（SARSA）$λ$学习方法，以优化非正交性多重访问（NOMA）辅助超级可靠的低延迟通信（URLLC）中的上行链路资源分配。为了减少随时间变化的网络环境中的平均解码错误概率，此工作设计了一种可靠的学习算法，用于提供长期资源分配，其中奖励反馈基于瞬时网络性能。借助拟议的算法，本文解决了Noma-urllc网络中可靠资源共享的三个主要挑战：1）用户聚类； 2）瞬时反馈系统； 3）最佳资源分配。所有这些设计都与所考虑的通信环境相互作用。最后，我们将所提出算法的性能与常规Q学习和SARSA Q学习算法进行比较。模拟结果表明：1）与传统的Q学习算法相比，所提出的解决方案能够在\ myb {200}情节内收敛，以提供低至$ 10^{ - 2} $长期均值错误； 2）Noma协助URLLC在解码错误概率方面优于传统的OMA系统； 3）提议的反馈系统对于长期学习过程是有效的。

In this paper, we propose a deep state-action-reward-state-action (SARSA) $λ$ learning approach for optimising the uplink resource allocation in non-orthogonal multiple access (NOMA) aided ultra-reliable low-latency communication (URLLC). To reduce the mean decoding error probability in time-varying network environments, this work designs a reliable learning algorithm for providing a long-term resource allocation, where the reward feedback is based on the instantaneous network performance. With the aid of the proposed algorithm, this paper addresses three main challenges of the reliable resource sharing in NOMA-URLLC networks: 1) user clustering; 2) Instantaneous feedback system; and 3) Optimal resource allocation. All of these designs interact with the considered communication environment. Lastly, we compare the performance of the proposed algorithm with conventional Q-learning and SARSA Q-learning algorithms. The simulation outcomes show that: 1) Compared with the traditional Q learning algorithms, the proposed solution is able to converges within \myb{200} episodes for providing as low as $10^{-2}$ long-term mean error; 2) NOMA assisted URLLC outperforms traditional OMA systems in terms of decoding error probabilities; and 3) The proposed feedback system is efficient for the long-term learning process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题