加速对分布式FPGA的深度神经进化，以解决增强学习问题

论文标题

加速对分布式FPGA的深度神经进化，以解决增强学习问题

Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems

论文作者

Asseman, Alexis, Antoine, Nicolas, Ozcan, Ahmet S.

论文摘要

深度神经网络的代表力量增强了强化学习，在高维问题（例如游戏和机器人控制）上显示出令人鼓舞的结果。但是，这些问题的顺序性质对计算效率构成了根本挑战。最近，诸如进化策略和深度神经进化之类的替代方法显示出竞争性的结果，并在分布式CPU核心上进行了更快的训练时间。在这里，我们报告了使用分布式FPGA实施的深神经进化的Atari 2600游戏的记录训练时间（每秒约100万帧）。在优化的管道中，将游戏机，图像预处理和神经网络的硬件实现结合在一起，乘以系统级并行性乘以加速。这些结果是IBM神经计算机上的第一个应用程序演示，该应用程序是一个自定义设计的系统，由432 Xilinx FPGA组成，该系统与3D网络网络拓扑相互连接。除了高性能外，与同一算法的CPU实施相比，实验还显示出所有游戏的准确性提高。

Reinforcement learning augmented by the representational power of deep neural networks, has shown promising results on high-dimensional problems, such as game playing and robotic control. However, the sequential nature of these problems poses a fundamental challenge for computational efficiency. Recently, alternative approaches such as evolutionary strategies and deep neuroevolution demonstrated competitive results with faster training time on distributed CPU cores. Here, we report record training times (running at about 1 million frames per second) for Atari 2600 games using deep neuroevolution implemented on distributed FPGAs. Combined hardware implementation of the game console, image pre-processing and the neural network in an optimized pipeline, multiplied with the system level parallelism enabled the acceleration. These results are the first application demonstration on the IBM Neural Computer, which is a custom designed system that consists of 432 Xilinx FPGAs interconnected in a 3D mesh network topology. In addition to high performance, experiments also showed improvement in accuracy for all games compared to the CPU-implementation of the same algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题