Deep learning has recently become one of the most compute/data-intensive methods and is widely used in many research areas and businesses. One of the critical challenges of deep learning is that it has many parameters that can be adjusted, and the optimal value may need to be determined for faster operation and high accuracy. The focus of this paper is the adjustable parameters of the dataloader. The dataloader in a system mainly groups the data appropriately and loads it to the main memory for the deep learning model to use. We introduce an automated framework called Dataloader Parameter Tuner (DPT) that determines the optimal value for the parameters required for the dataloader. This framework discovers the optimal values for the number of dataloader's subprocesses (i.e., worker) and prefetch factor through grid search to accelerate the data transfer for machine learning systems.
翻译:深层学习最近已成为计算/数据密集型方法之一,并广泛用于许多研究领域和企业。深层学习的关键挑战之一是,它有许多可以调整的参数,可能需要确定最佳价值,以便更快操作和高精度。本文的重点是可调整的数据处理器参数。系统中的数据处理器主要适当地将数据分组,并将其装入要使用的深层学习模型的主要记忆中。我们引入了一个自动框架,称为Dataloader Parameter Tuner(DPT),以确定数据载荷器所需参数的最佳价值。这个框架通过电网搜索,为计算机学习系统加速数据传输,发现了数据载荷子进程(即工人)和预伸缩系数的最佳值。