论文标题
我的训练瓶颈在哪里?深度学习预处理管道中的隐藏权衡
Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines
论文作者
论文摘要
深度学习中的预处理管道旨在提供足够的数据吞吐量,以使培训过程繁忙。随着训练过程的吞吐量随着硬件创新(例如,更快的GPU,TPU和连接速度)和高级并行化技术的增加,最大化资源利用率变得越来越具有挑战性。同时,为了培训日益复杂的模型所需的培训数据量正在增长。由于这一发展,数据预处理和配置正在成为端到端深度学习管道中的严重瓶颈。 在本文中,我们对来自四个不同机器学习域的数据预处理管道进行了深入分析。我们介绍了一个新的观点,以有效准备端到端深度学习管道的数据集并提取个人权衡,以优化吞吐量,预处理时间和存储消耗。此外,我们还提供一个开源分析库,该库可以自动决定合适的预处理策略以最大化吞吐量。通过将我们生成的见解应用于现实世界的用例,与未调节的系统相比,我们获得了3倍至13倍的吞吐量,同时保持管道在功能上相同。这些发现显示了数据管道调整的巨大潜力。
Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with hardware innovations (e.g., faster GPUs, TPUs, and inter-connects) and advanced parallelization techniques that yield better scalability. At the same time, the amount of training data needed in order to train increasingly complex models is growing. As a consequence of this development, data preprocessing and provisioning are becoming a severe bottleneck in end-to-end deep learning pipelines. In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption. Additionally, we provide an open-source profiling library that can automatically decide on a suitable preprocessing strategy to maximize throughput. By applying our generated insights to real-world use-cases, we obtain an increased throughput of 3x to 13x compared to an untuned system while keeping the pipeline functionally identical. These findings show the enormous potential of data pipeline tuning.