Dataloaders, in charge of moving data from storage into GPUs while training machine learning models, might hold the key to drastically improving the performance of training jobs. Recent advances have shown promise not only by considerably decreasing training time but also by offering new features such as loading data from remote storage like S3. In this paper, we are the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features. Finally, we offer a comprehensive comparison of the different dataloading libraries available, their trade-offs in terms of functionality, usability, and performance and the insights derived from them.
翻译:在培训机器学习模式的同时,负责将数据从储存转移到GPU的数据处理员可能会成为大幅度改进培训工作业绩的关键,最近的进展不仅通过大大减少培训时间,而且通过提供诸如S3等远程储存的装载数据等新特征,显示出希望;在本文件中,我们首先将数据处理员作为深层学习工作流程的一个单独组成部分加以区分,并概述其结构和特点;最后,我们综合比较了现有的不同数据处理图书馆、它们在功能、可用性、性能、业绩和洞察力方面的取舍。