实施基于GPU的平行最大蚂蚁系统

论文标题

实施基于GPU的平行最大蚂蚁系统

Implementing a GPU-based parallel MAX-MIN Ant System

论文作者

Skinderowicz, Rafał

论文摘要

最大的蚂蚁系统（MMA）是最著名的蚂蚁菌落优化（ACO）算法之一，被证明是有效地找到令人满意的解决方案，以解决许多困难的组合优化问题。摩尔定律的减速以及能够高速进行通用计算的图形处理单元（GPU）的可用性引发了大量的研究工作，以开发基于GPU的ACO实现。在本文中，我们讨论了一系列新颖的想法，以改善基于GPU的平行MMAS实现，从而使其能够更好地利用随后的两个NVIDIA GPU架构提供的计算能力。具体而言，基于加权储层采样算法，我们提出了一种新型的平行节点选择程序，该过程是MMA和其他ACO算法的核心。我们还提出了另一个关键组件 - 禁忌列表结构的内存效率实现，该结构用于ACO的解决方案构造阶段。拟议的实施以及现有方法结合使用，总共导致了六个MMA变体，这些变体对一组旅行推销员问题（TSP）实例进行了评估，范围为198至3,795个城市。结果表明，我们的MMA实施与基于GPU的最先进和基于多核CPU的平行ACO实现具有竞争力：实际上，NVIDIA V100 Volta GPU获得的时代分别高达7.18倍和21.79倍。在解决1,002个城市实例时，提出的MMAS变体中最快的MMA变体能够每秒产生超过100万个候选解决方案。此外，我们表明，与2-OPT局部搜索启发式构度相结合，所提出的平行MMA可为TSP实例找到高质量的解决方案，该实例具有多达18,512个节点。

The MAX-MIN Ant System (MMAS) is one of the best-known Ant Colony Optimization (ACO) algorithms proven to be efficient at finding satisfactory solutions to many difficult combinatorial optimization problems. The slow-down in Moore's law, and the availability of graphics processing units (GPUs) capable of conducting general-purpose computations at high speed, has sparked considerable research efforts into the development of GPU-based ACO implementations. In this paper, we discuss a range of novel ideas for improving the GPU-based parallel MMAS implementation, allowing it to better utilize the computing power offered by two subsequent Nvidia GPU architectures. Specifically, based on the weighted reservoir sampling algorithm we propose a novel parallel implementation of the node selection procedure, which is at the heart of the MMAS and other ACO algorithms. We also present a memory-efficient implementation of another key-component -- the tabu list structure -- which is used in the ACO's solution construction stage. The proposed implementations, combined with the existing approaches, lead to a total of six MMAS variants, which are evaluated on a set of Traveling Salesman Problem (TSP) instances ranging from 198 to 3,795 cities. The results show that our MMAS implementation is competitive with state-of-the-art GPU-based and multi-core CPU-based parallel ACO implementations: in fact, the times obtained for the Nvidia V100 Volta GPU were up to 7.18x and 21.79x smaller, respectively. The fastest of the proposed MMAS variants is able to generate over 1 million candidate solutions per second when solving a 1,002-city instance. Moreover, we show that, combined with the 2-opt local search heuristic, the proposed parallel MMAS finds high-quality solutions for the TSP instances with up to 18,512 nodes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题