End-to-end (E2E) artificial intelligence (AI) pipelines are composed of several stages including data preprocessing, data ingestion, defining and training the model, hyperparameter optimization, deployment, inference, postprocessing, followed by downstream analyses. To obtain efficient E2E workflow, it is required to optimize almost all the stages of pipeline. Intel Xeon processors come with large memory capacities, bundled with AI acceleration (e.g., Intel Deep Learning Boost), well suited to run multiple instances of training and inference pipelines in parallel and has low total cost of ownership (TCO). To showcase the performance on Xeon processors, we applied comprehensive optimization strategies coupled with software and hardware acceleration on variety of E2E pipelines in the areas of Computer Vision, NLP, Recommendation systems, etc. We were able to achieve a performance improvement, ranging from 1.8x to 81.7x across different E2E pipelines. In this paper, we will be highlighting the optimization strategies adopted by us to achieve this performance on Intel Xeon processors with a set of eight different E2E pipelines.
翻译:终端到终端(E2E)人工智能(AI)管道由几个阶段组成,包括数据处理、数据摄入、界定和培训模型、超光度优化、部署、推推、后处理,然后是下游分析。为了获得高效的E2E工作流程,必须优化管道几乎所有阶段的管道。Intel Xeon处理器具有巨大的存储能力,与AI加速器(例如Intel Deep Elearning Boutst)相捆绑在一起,非常适合平行运行多个培训和推断管道,拥有的总成本较低(TCO)。为了展示Xeon处理器的性能,我们在计算机视野、NLP、建议系统等领域应用了全面优化战略,同时对各种E2E管道应用了软件和硬件加速。我们得以实现性能改进,从1.8x到81.7x跨不同E2E管道的性能。在本文中,我们将强调我们为实现Intel Xeon处理器的这种性能而采用一套8种不同的E2E管道而采取的优化战略。