基于分解的定向问题的深度加强学习

论文标题

基于分解的定向问题的深度加强学习

Deep Reinforcement Learning for Orienteering Problems Based on Decomposition

论文作者

Liu, Wei, Zhang, Tao, Wang, Rui, Li, Kaiwen, Li, Wenhua, Yang, Kang

论文摘要

本文通过将其分为两个部分来解决定向问题（OP）的新方法：背包问题（KP）和旅行推销员问题（TSP）。 KP求解器负责选择节点，而TSP求解器负责设计适当的路径并协助KP求解器判断违规行为。为了解决约束，我们提出了一种双重群的协调算法（DPCA）作为KP求解器，该算法同时保持了可行和不可行的人群。引入了动态指针网络（DYPN）作为TSP求解器，该求解器将城市位置作为输入，并立即输出节点的排列。通过增强学习训练的模型可以捕获给定问题的结构和动态模式。该模型可以推广到具有不同尺度和分布的其他实例。实验结果表明，所提出的算法可以在训练，推理和概括能力方面胜过常规方法。

This paper presents a new method for solving an orienteering problem (OP) by breaking it down into two parts: a knapsack problem (KP) and a traveling salesman problem (TSP). A KP solver is responsible for picking nodes, while a TSP solver is responsible for designing the proper path and assisting the KP solver in judging constraint violations. To address constraints, we propose a dual-population coevolutionary algorithm (DPCA) as the KP solver, which simultaneously maintains both feasible and infeasible populations. A dynamic pointer network (DYPN) is introduced as the TSP solver, which takes city locations as inputs and immediately outputs a permutation of nodes. The model, which is trained by reinforcement learning, can capture both the structural and dynamic patterns of the given problem. The model can generalize to other instances with different scales and distributions. Experimental results show that the proposed algorithm can outperform conventional approaches in terms of training, inference, and generalization ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题