一种能量意识和容忍缺陷的深入增强学习方法，用于多代理巡逻问题

论文标题

一种能量意识和容忍缺陷的深入增强学习方法，用于多代理巡逻问题

An Energy-aware and Fault-tolerant Deep Reinforcement Learning based approach for Multi-agent Patrolling Problems

论文作者

Tong, Chenhao, Harwood, Aaron, Rodriguez, Maria A., Sinnott, Richard O.

论文摘要

自动驾驶汽车适合连续巡逻问题。但是，由于许多原因，寻找最佳的巡逻策略可能是具有挑战性的。首先，巡逻环境通常很复杂，可能包括未知的环境因素，例如风景或景观。其次，自动驾驶汽车可能会有故障或硬件限制，例如电池寿命有限。重要的是，巡逻大面积通常需要多个需要集体协调其行动的代理商。在这项工作中，我们考虑了这些局限性，并提出了一种基于无模型，深度多机构强化学习的方法。在这种方法中，对代理商进行了培训，可以巡逻具有各种未知动态和因素的环境。他们可以自动为自己充电以支持持续的集体巡逻。提出了分布式均质的多机构体系结构，所有巡逻代理商都根据其本地观察和共享位置信息在本地执行相同的政策。该体系结构提供了一个巡逻系统，该系统可以容忍代理失败并允许添加补充代理以替代失败的代理或提高整体巡逻表现。通过从多个角度进行仿真实验来验证该解决方案，包括整体巡逻表现，电池充电策略的效率，整体容错的耐受性以及与补充药物合作的能力。

Autonomous vehicles are suited for continuous area patrolling problems. However, finding an optimal patrolling strategy can be challenging for many reasons. Firstly, patrolling environments are often complex and can include unknown environmental factors, such as wind or landscape. Secondly, autonomous vehicles can have failures or hardware constraints, such as limited battery life. Importantly, patrolling large areas often requires multiple agents that need to collectively coordinate their actions. In this work, we consider these limitations and propose an approach based on model-free, deep multi-agent reinforcement learning. In this approach, the agents are trained to patrol an environment with various unknown dynamics and factors. They can automatically recharge themselves to support continuous collective patrolling. A distributed homogeneous multi-agent architecture is proposed, where all patrolling agents execute identical policies locally based on their local observations and shared location information. This architecture provides a patrolling system that can tolerate agent failures and allow supplementary agents to be added to replace failed agents or to increase the overall patrol performance. The solution is validated through simulation experiments from multiple perspectives, including the overall patrol performance, the efficiency of battery recharging strategies, the overall fault tolerance, and the ability to cooperate with supplementary agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题