无线数据收获的无人机路径计划：一种深厚的加固学习方法

论文标题

无线数据收获的无人机路径计划：一种深厚的加固学习方法

UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

论文作者

Bayerlein, Harald, Theile, Mirco, Caccamo, Marco, Gesbert, David

论文摘要

支持下一代通信网络的无人飞机（UAV）的自主部署需要有效的轨迹计划方法。我们建议在城市环境中从物联网（IoT）设备中启用无人机数据收集的新端到端增强学习（RL）方法。自主无人机的任务是收集来自分布式传感器节点的数据，但避免飞行时间和避免障碍物。尽管以前的方法，学习和基于学习的方法，必须执行昂贵的重新计算或重新学习行为时，当重要场景参数（例如传感器，传感器位置或最大飞行时间）等重要的情况时，我们会训练双重深层Q-network（DDQN），并结合使用无用的体验来学习无人机控制策略，以使更改场景概括了跨越场景参数的概括。通过利用通过卷积网络层馈送的环境的多层图，我们表明我们提出的网络体系结构使代理可以为各种场景参数做出移动决策，以平衡数据收集目标与飞行时间效率和安全性约束。还说明了通过使用以非中心地图为中心的地图上居中的地图来实现学习效率的可观优势。

Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题