论文标题

数据加载器景观的概述:比较性能分析

An Overview of the Data-Loader Landscape: Comparative Performance Analysis

论文作者

Ofeidis, Iason, Kiedanski, Diego, Tassiulas, Leandros

论文摘要

负责将数据从存储转移到GPU的同时,在培训机器学习模型的同时,数据加载器可能会大大提高培训工作的绩效。最近的进步不仅通过大大减少训练时间,而且还提供了新功能,例如从远程存储(如S3)加载数据,这表明了希望。在本文中,我们是第一个将数据加载器区分为深度学习(DL)工作流程中的单独组件并概述其结构和功能的单独组件。最后,我们对可用的不同数据库,在功能,可用性和性能方面的权衡以及从中获得的见解进行了全面比较。

Dataloaders, in charge of moving data from storage into GPUs while training machine learning models, might hold the key to drastically improving the performance of training jobs. Recent advances have shown promise not only by considerably decreasing training time but also by offering new features such as loading data from remote storage like S3. In this paper, we are the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features. Finally, we offer a comprehensive comparison of the different dataloading libraries available, their trade-offs in terms of functionality, usability, and performance and the insights derived from them.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源