Training deep neural networks (DNNs) is time-consuming. While most existing solutions try to overlap/schedule computation and communication for efficient training, this paper goes one step further by skipping computing and communication through DNN layer freezing. Our key insight is that the training progress of internal DNN layers differs significantly, and front layers often become well-trained much earlier than deep layers. To explore this, we first introduce the notion of training plasticity to quantify the training progress of internal DNN layers. Then we design Egeria, a knowledge-guided DNN training system that employs semantic knowledge from a reference model to accurately evaluate individual layers' training plasticity and safely freeze the converged ones, saving their corresponding backward computation and communication. Our reference model is generated on the fly using quantization techniques and runs forward operations asynchronously on available CPUs to minimize the overhead. In addition, Egeria caches the intermediate outputs of the frozen layers with prefetching to further skip the forward computation. Our implementation and testbed experiments with popular vision and language models show that Egeria achieves 19%-43% training speedup w.r.t. the state-of-the-art without sacrificing accuracy.
翻译:深心神经网络(DNNs) 培训耗时。 虽然大多数现有解决方案试图通过 DNN 层冻结跳过计算和通信,跳过 DNN 层冻结的计算和通信,但本文更进一步。 我们的关键见解是, DNN 层的内部培训进度差异很大, 前层的训练通常比深层培训要早得多。 为了探索这一点,我们首先引入了培训可塑性的概念,以量化 DNN 层的内部培训进度。 然后我们设计了Egeria, 这是一种知识导向的DNN培训系统,从一个参考模型中利用语义学知识来准确评估单个层培训的可塑性,安全地冻结聚合的可变的可塑性,保存相应的后向计算和通信。 我们的参考模型是在飞翔上使用四分解技术生成的,并在现有的CPP 上同步运行前操作,以尽量减少间接费用。 此外, Egeria 将冷层的中间产出隐藏在进一步跳过前的计算中。 我们用流行的视觉和语言模型进行的测试实验显示Egeria 达到19%的19- 43%的训练速度。r.t- grelafillefillefillef- flafalf-falf- flap-st-fillation-falfalfillation-falfalfation-falfalfalf。</s>