深网的内存优化

论文标题

深网的内存优化

Memory Optimization for Deep Networks

论文作者

Shah, Aashaka, Wu, Chao-Yuan, Mohan, Jayashree, Chidambaram, Vijay, Krähenbühl, Philipp

论文摘要

深度学习缓慢但稳定地打动了记忆瓶颈。在过去的五年中，顶级GPU的张量计算增加了32倍，但可用的可用内存仅增长了2.5倍。这阻止了研究人员探索较大的体系结构，因为训练大型网络需要更多的内存来存储中间输出。在本文中，我们提出了Monet，这是一个自动框架，可最大程度地减少深层网络的内存足迹和计算开销。莫奈共同优化了检查点时间表和各种操作员的实施。莫奈能够胜过所有先前的手工调整操作以及自动检查点。对于各种Pytorch模型，Monet将总体内存需求减少了3倍，计算中的开销为9-16％。对于相同的计算成本，莫奈需要比当前最新的自动检查点框架少的1.2-1.8 x。我们的代码可在https://github.com/utsaslab/monet上找到。

Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, MONeT requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is available at https://github.com/utsaslab/MONeT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题