In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.
翻译:在本文中,我们展示了以Python为主并以PyTorrch为主的软件库PartIME,这是一个以Python为主、以PyTorrch为主的软件库,专为加快神经网络而设计,在数据源源不断流动时,为学习和推断目的专门设计,现有图书馆旨在利用数据级平行,假设样品是分批的,在基于流数据的应用中,这一条件并非自然满足。不同的是,PartIME在从流流获得数据时开始处理每个数据样本。PartIME包扎了执行反馈向前多层网络的代码,并在像图形处理器(GPUs)等多个设备之间分配分层处理。由于基于管道的计算方案,PartIME允许这些设备同时进行数据平行计算。在推论阶段,在学习阶段,PartIME可以利用流数据的非i.i.d。流数据的性质,随着时间的平稳演变,用于高效的梯度计算,在图像处理器中,通过实验将模型进行,以模拟方式对8号进行在线分析。