通过拍卖的长期网络资源分配的多代理强化学习：V2X应用程序

论文标题

通过拍卖的长期网络资源分配的多代理强化学习：V2X应用程序

Multi-Agent Reinforcement Learning for Long-Term Network Resource Allocation through Auction: a V2X Application

论文作者

Tan, Jing, Khalili, Ramin, Karl, Holger, Hecker, Artur

论文摘要

我们从一个动态的移动代理（例如，汽车）中卸载计算任务，作为自主代理之间的分散决策。我们设计了一种互动机制，该机制通过平衡竞争与合作之间的平衡来激励这种代理以使私人和系统目标保持一致。在静态情况下，该机制证明具有NASH平衡，并具有最佳的资源分配。在动态环境中，这种机制对完整信息的要求是不可能实现的。对于这种环境，我们提出了一种新型的多学院在线学习算法，该算法以部分，延迟和嘈杂的状态信息学习，从而大大减少了信息需求。我们的算法还能够从延迟不同的长期和稀疏奖励信号中学习。 V2X应用程序模拟的经验结果证实，通过学习，具有学习算法的代理可以显着改善系统和个人性能，从而降低了下载故障率，通信开销和负载变化的30％，从而增加了计算资源资源利用率和公平性。结果还证实了该算法在不同环境中的良好收敛性和泛化属性。

We formulate offloading of computational tasks from a dynamic group of mobile agents (e.g., cars) as decentralized decision making among autonomous agents. We design an interaction mechanism that incentivizes such agents to align private and system goals by balancing between competition and cooperation. In the static case, the mechanism provably has Nash equilibria with optimal resource allocation. In a dynamic environment, this mechanism's requirement of complete information is impossible to achieve. For such environments, we propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information, thus greatly reducing information need. Our algorithm is also capable of learning from long-term and sparse reward signals with varying delay. Empirical results from the simulation of a V2X application confirm that through learning, agents with the learning algorithm significantly improve both system and individual performance, reducing up to 30% of offloading failure rate, communication overhead and load variation, increasing computation resource utilization and fairness. Results also confirm the algorithm's good convergence and generalization property in different environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题