灵活：有效地编译模型推理的动态神经网络

论文标题

灵活：有效地编译模型推理的动态神经网络

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

论文作者

Shen, Haichen, Roesch, Jared, Chen, Zhi, Chen, Wei, Wu, Yong, Li, Mu, Sharma, Vin, Tatlock, Zachary, Wang, Yida

论文摘要

现代深度神经网络越来越多地利用了动态控制流，数据结构和动态张量形状等功能。现有的深度学习系统专注于优化和执行静态神经网络，这些神经网络假设预定的模型架构和输入数据形状 - 被动态神经网络违反的示意图。因此，使用深度学习系统执行动态模型目前既稳定又是最佳的，即使不是不可能的。优化动态神经网络比静态神经网络更具挑战性。优化必须考虑所有可能的执行路径和张量形状。本文提出了灵活性和灵活的系统，以优化，编译和执行多个平台上的动态神经网络。 Nimble通过引入动态类型系统，一组面向动态的优化以及轻巧的虚拟机运行时来处理模型动态。我们的评估表明，在包括Intel CPU，ARM CPU和NVIDIA GPU在内的硬件平台上，Nimble优于最先进的深度学习框架和动态神经网络的运行时系统。

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a pre-determined model architecture and input data shapes--assumptions which are violated by dynamic neural networks. Therefore, executing dynamic models with deep learning systems is currently both inflexible and sub-optimal, if not impossible. Optimizing dynamic neural networks is more challenging than static neural networks; optimizations must consider all possible execution paths and tensor shapes. This paper proposes Nimble, a high-performance and flexible system to optimize, compile, and execute dynamic neural networks on multiple platforms. Nimble handles model dynamism by introducing a dynamic type system, a set of dynamism-oriented optimizations, and a light-weight virtual machine runtime. Our evaluation demonstrates that Nimble outperforms state-of-the-art deep learning frameworks and runtime systems for dynamic neural networks by up to 20x on hardware platforms including Intel CPUs, ARM CPUs, and Nvidia GPUs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题