基于低维的移动机器人导航的深度强化学习的确定性和随机分析

论文标题

基于低维的移动机器人导航的深度强化学习的确定性和随机分析

Deterministic and Stochastic Analysis of Deep Reinforcement Learning for Low Dimensional Sensing-based Navigation of Mobile Robots

论文作者

Grando, Ricardo B., de Jesus, Junior C., Kich, Victor A., Kolling, Alisson H., Guerra, Rodrigo S., Drews-Jr, Paulo L. J.

论文摘要

深钢筋学习（DEEP-RL）中的确定性和随机技术已成为改善运动控制和各种机器人的决策任务的有前途的解决方案。先前的工作表明，这些深-RL算法通常可以应用于一般的移动机器人的无MAP导航。但是，他们倾向于使用简单的传感策略，因为已经证明它们在高维状态空间（例如基于图像的传感的空间）方面的性能很差。本文在执行移动机器人无地图导航的任务时，对两种深-RL技术进行了比较分析 - 深确定性策略梯度（DDPG）和软参与者批评（SAC）。我们的目标是通过展示神经网络体系结构如何影响学习本身的贡献，并根据每种方法的航空移动机器人导航的时间和距离提出定量结果。总体而言，我们对六个不同体系结构的分析凸显了随机方法（SAC）更好地使用更深层次的体系结构，而相反的选择是确定性方法（DDPG）发生的。

Deterministic and Stochastic techniques in Deep Reinforcement Learning (Deep-RL) have become a promising solution to improve motion control and the decision-making tasks for a wide variety of robots. Previous works showed that these Deep-RL algorithms can be applied to perform mapless navigation of mobile robots in general. However, they tend to use simple sensing strategies since it has been shown that they perform poorly with a high dimensional state spaces, such as the ones yielded from image-based sensing. This paper presents a comparative analysis of two Deep-RL techniques - Deep Deterministic Policy Gradients (DDPG) and Soft Actor-Critic (SAC) - when performing tasks of mapless navigation for mobile robots. We aim to contribute by showing how the neural network architecture influences the learning itself, presenting quantitative results based on the time and distance of navigation of aerial mobile robots for each approach. Overall, our analysis of six distinct architectures highlights that the stochastic approach (SAC) better suits with deeper architectures, while the opposite happens with the deterministic approach (DDPG).

下载PDF全文

下载文献需遵守相关版权规定

论文标题