论文标题
使用推出和Max-sat解决定时窗口的电容车辆路由问题
Solving the capacitated vehicle routing problem with timing windows using rollouts and MAX-SAT
论文作者
论文摘要
车辆路由问题是文献中众所周知的NP-HARD组合优化问题。传统的解决方案方法涉及精心设计的启发式方法或耗时的元启发术。强化学习的最新工作一直是一种有希望的替代方法,但发现在解决方案质量方面很难与传统方法竞争。本文提出了一种混合方法,结合了加强学习,政策推广和可满足性的解决方案,以实现计算时间和解决方案质量之间的可调整权衡。在流行的公共数据集中的结果表明,该算法能够比现有基于学习的方法更接近最佳水平,并且计算时间较短。该方法需要最少的设计工作,并且能够在没有额外培训的情况下解决看不见的任意规模问题。此外,该方法可以推广到其他组合优化问题。
The vehicle routing problem is a well known class of NP-hard combinatorial optimisation problems in literature. Traditional solution methods involve either carefully designed heuristics, or time-consuming metaheuristics. Recent work in reinforcement learning has been a promising alternative approach, but has found it difficult to compete with traditional methods in terms of solution quality. This paper proposes a hybrid approach that combines reinforcement learning, policy rollouts, and a satisfiability solver to enable a tunable tradeoff between computation times and solution quality. Results on a popular public data set show that the algorithm is able to produce solutions closer to optimal levels than existing learning based approaches, and with shorter computation times than meta-heuristics. The approach requires minimal design effort and is able to solve unseen problems of arbitrary scale without additional training. Furthermore, the methodology is generalisable to other combinatorial optimisation problems.