用于多目标调度的分层深度加固学习方法，带有不同的队列大小

论文标题

用于多目标调度的分层深度加固学习方法，带有不同的队列大小

Hierarchical Deep Reinforcement Learning Approach for Multi-Objective Scheduling With Varying Queue Sizes

论文作者

Birman, Yoni, Ido, Ziv, Katz, Gilad, Shabtai, Asaf

论文摘要

多目标任务调度（MOTS）是任务调度，同时优化了多个且可能与约束相矛盾。当每个单独的任务都是多目标优化问题时，就会发生一个具有挑战性的扩展。虽然深厚的增强学习（DRL）已成功地应用于复杂的顺序问题，但其在MOTS领域的应用已被两个挑战所困扰。第一个挑战是DRL算法无法确保每个项目在队列中的位置如何处理。第二个挑战是需要管理大型队列，这会导致大型神经体系结构和较长的培训时间。在这项研究中，我们介绍了Merlin，这是一种可用于多目标任务调度的强大，模块化和近乎最佳的DRL方法。 Merlin通过创建一个用于处理单个任务的神经网络而对MOTS问题采用层次结构方法，而另一种用于调度整个队列的神经网络。除了较小且训练时间短，所得的体系结构可确保以相同方式处理项目，无论其在队列中的位置如何。此外，我们提出了一种新型方法，用于有效地在非常大的队列上应用基于DRL的解决方案，并证明我们如何有效地扩展Merlin的处理，而Merlin是按数量级大的数量级来处理的队列大小。对多个队列尺寸的广泛评估表明，梅林的表现要优于多个知名的基线（> 22％）。

Multi-objective task scheduling (MOTS) is the task scheduling while optimizing multiple and possibly contradicting constraints. A challenging extension of this problem occurs when every individual task is a multi-objective optimization problem by itself. While deep reinforcement learning (DRL) has been successfully applied to complex sequential problems, its application to the MOTS domain has been stymied by two challenges. The first challenge is the inability of the DRL algorithm to ensure that every item is processed identically regardless of its position in the queue. The second challenge is the need to manage large queues, which results in large neural architectures and long training times. In this study we present MERLIN, a robust, modular and near-optimal DRL-based approach for multi-objective task scheduling. MERLIN applies a hierarchical approach to the MOTS problem by creating one neural network for the processing of individual tasks and another for the scheduling of the overall queue. In addition to being smaller and with shorted training times, the resulting architecture ensures that an item is processed in the same manner regardless of its position in the queue. Additionally, we present a novel approach for efficiently applying DRL-based solutions on very large queues, and demonstrate how we effectively scale MERLIN to process queue sizes that are larger by orders of magnitude than those on which it was trained. Extensive evaluation on multiple queue sizes show that MERLIN outperforms multiple well-known baselines by a large margin (>22%).

下载PDF全文

下载文献需遵守相关版权规定

论文标题