机器人导航任务的量子深度加固学习

论文标题

机器人导航任务的量子深度加固学习

Quantum Deep Reinforcement Learning for Robot Navigation Tasks

论文作者

Hohenfeld, Hans, Heimann, Dirk, Wiebe, Felix, Kirchner, Frank

论文摘要

我们利用混合量子的深钢筋学习学习在增加复杂性的模拟环境中，为简单的车轮机器人学习导航任务。为此，我们在混合量子 - 古典设置中使用两种不同的编码策略来训练参数化的量子电路（PQC），以及具有双重深Q网络（DDQN）增强学习算法的经典神经网络基线。量子深钢筋学习（QDRL）以前已经在几个相对简单的基准环境中进行了研究，主要来自OpenAI Gym Suite。但是，将QDRL对更接近现实世界问题的更苛刻任务的缩放行为和适用性e。 g。从机器人域域，以前尚未研究。在这里，我们表明，混合量子 - 经典增强学习设置中的量子电路能够在多个机器人导航方案中学习最佳策略，与经典基线相比，具有明显训练参数的较少训练。在大量的实验配置中，我们发现使用的量子电路在等同于可训练参数的数量时优于经典神经网络基线。然而，经典的神经网络始终显示出关于训练时间和稳定性的更好结果，至少一个可训练的参数的数量级要比表现最好的量子电路多。但是，验证在较大且动态的环境中学习方法的鲁棒性，我们发现经典基线总体上会产生更稳定，更好的绩效策略。

We utilize hybrid quantum deep reinforcement learning to learn navigation tasks for a simple, wheeled robot in simulated environments of increasing complexity. For this, we train parameterized quantum circuits (PQCs) with two different encoding strategies in a hybrid quantum-classical setup as well as a classical neural network baseline with the double deep Q network (DDQN) reinforcement learning algorithm. Quantum deep reinforcement learning (QDRL) has previously been studied in several relatively simple benchmark environments, mainly from the OpenAI gym suite. However, scaling behavior and applicability of QDRL to more demanding tasks closer to real-world problems e. g., from the robotics domain, have not been studied previously. Here, we show that quantum circuits in hybrid quantum-classic reinforcement learning setups are capable of learning optimal policies in multiple robotic navigation scenarios with notably fewer trainable parameters compared to a classical baseline. Across a large number of experimental configurations, we find that the employed quantum circuits outperform the classical neural network baselines when equating for the number of trainable parameters. Yet, the classical neural network consistently showed better results concerning training times and stability, with at least one order of magnitude of trainable parameters more than the best-performing quantum circuits. However, validating the robustness of the learning methods in a large and dynamic environment, we find that the classical baseline produces more stable and better performing policies overall.

下载PDF全文

下载文献需遵守相关版权规定

论文标题