改进的增强学习算法用于学习分支

论文标题

改进的增强学习算法用于学习分支

An Improved Reinforcement Learning Algorithm for Learning to Branch

论文作者

Qu, Qingyu, Li, Xijun, Zhou, Yunfan, Zeng, Jia, Yuan, Mingxuan, Wang, Jie, Lv, Jinhu, Liu, Kexin, Mao, Kun

论文摘要

大多数组合优化问题都可以作为混合整数线性编程（MILP）进行表述，其中分支和结合（B \＆B）是一种通用且广泛使用的方法。最近，学习分支已成为机器学习和组合优化的交集的热门研究主题。在本文中，我们提出了一种基于新的增强学习的B \＆B算法。与离线增强学习类似，我们最初训练演示数据以大大加速学习。随着培训效果的改善，代理商开始逐渐与环境互动。通过确定演示和自我生成数据之间的混合比来提高算法的性能至关重要。因此，我们提出了一种优先的存储机制来自动控制该比率。为了提高培训过程的鲁棒性，还基于双DQN引入了高级网络，该网络始终用作具有竞争性能的Q网络。我们在三个公共研究基准中评估了拟议算法的性能，并将其与强大的基准相比，包括三种经典启发式方法和一种基于最新的基于学习的基于学习的分支算法。结果表明，所提出的算法在比较算法中实现了最佳性能，并且具有不断提高B \＆B算法性能的潜力。

Most combinatorial optimization problems can be formulated as mixed integer linear programming (MILP), in which branch-and-bound (B\&B) is a general and widely used method. Recently, learning to branch has become a hot research topic in the intersection of machine learning and combinatorial optimization. In this paper, we propose a novel reinforcement learning-based B\&B algorithm. Similar to offline reinforcement learning, we initially train on the demonstration data to accelerate learning massively. With the improvement of the training effect, the agent starts to interact with the environment with its learned policy gradually. It is critical to improve the performance of the algorithm by determining the mixing ratio between demonstration and self-generated data. Thus, we propose a prioritized storage mechanism to control this ratio automatically. In order to improve the robustness of the training process, a superior network is additionally introduced based on Double DQN, which always serves as a Q-network with competitive performance. We evaluate the performance of the proposed algorithm over three public research benchmarks and compare it against strong baselines, including three classical heuristics and one state-of-the-art imitation learning-based branching algorithm. The results show that the proposed algorithm achieves the best performance among compared algorithms and possesses the potential to improve B\&B algorithm performance continuously.

下载PDF全文

下载文献需遵守相关版权规定

论文标题