Over the past decade, machine learning model complexity has grown at an extraordinary rate, as has the scale of the systems training such large models. However there is an alarmingly low hardware utilization (5-20%) in large scale AI systems. The low system utilization is a cumulative effect of minor losses across different layers of the stack, exacerbated by the disconnect between engineers designing different layers spanning across different industries. We propose CrossFlow, a novel framework that enables cross-layer analysis all the way from the technology layer to the algorithmic layer. We also propose DeepFlow (built on top of CrossFlow using machine learning techniques) to automate the design space exploration and co-optimization across different layers of the stack. We have validated CrossFlow accuracy with distributed training on real commercial hardware and showcase several DeepFlow case studies demonstrating pitfalls of not optimizing across the technology-hardware-software stack for what is likely, the most important workload driving large development investments in all aspects of computing stack.
翻译:过去十年来,机器学习模型的复杂性以非同寻常的速度增长,正如系统培训大型模型的规模一样。但大型AI系统中硬件利用率(5-20%)低得惊人地低得惊人。系统利用率低是堆叠不同层小损失的累积效应,而设计不同行业不同层的工程师之间脱节又加剧了这种效应。我们提议CrossFlow,这是一个能够从技术层到算法层进行跨层分析的新框架。我们还提议DeepFlow(用机器学习技术建在CrossFlow顶端)将设计空间探索和在堆叠的不同层实现同步化自动化。我们验证了CrossFlow的准确性,对实际商业硬件进行了分散培训,并展示了DepFlow的若干案例研究,这些案例研究表明,在技术硬件-软件堆放方面没有实现优化的陷阱,而这是在计算堆堆中进行大量发展投资的最重要的工作量。