综合系统深度学习工作量业绩分析 (Performance Analysis of Deep Learning Workloads on a Composable System)

A composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool and that can be grouped in various configurations to meet application requirements. This freedom to 'mix and match' resources dynamically allows for experimentation early in the design cycle, prior to the final architectural design or hardware implementation of a system. This design provides flexibility to serve a variety of workloads and provides a dynamic co-design platform that allows experiments and measurements in a controlled manner. For instance, key performance bottlenecks can be revealed early on in the experimentation phase thus avoiding costly and time consuming mistakes. Additionally, various system-level topologies can be evaluated when experimenting with new System on Chip (SoCs) and new accelerator types. This paper details the design of an enterprise composable infrastructure that we have implemented and made available to our partners in the IBM Research AI Hardware Center (AIHC). Our experimental evaluations on the composable system give insights into how the system works and evaluates the impact of various resource aggregations and reconfigurations on representative deep learning benchmarks.

翻译：compable 基础设施的定义是资源,如计算、储存、加速器和网络等资源,这些资源共享在一个库库中,可以归为各种配置,以满足应用要求。这种“混合和匹配”资源的自由动态地允许在设计周期的早期阶段,在最后的建筑设计或系统硬件实施之前进行实验。这种设计为各种工作量提供服务提供了灵活性,并提供了一个动态的共同设计平台,允许以有控制的方式进行实验和测量。例如,在试验阶段早期就可发现关键的性能瓶颈,从而避免费用高昂和耗时的错误。此外,在试验新的芯片系统和新的加速器类型时,可以评估各种系统级的表层。本文详细说明了我们所实施并提供给IBM Research AI Hardware中心(AIHC)合作伙伴的企业可配置基础设施的设计。我们对可调制系统的实验性评估揭示了系统如何运作,并评估了各种资源组合和重组对有代表性的深层次学习基准的影响。