VIP内容

摘要: 随着人工智能和大数据等计算机应用对算力需求的迅猛增长以及应用场景的多样化, 异构混合并行计算成为了研究的重点。文中介绍了当前主要的异构计算机体系结构, 包括CPU/协处理器、CPU/众核处理器、CPU/ASCI和CPU/FPGA等;简述了异构混合并行编程模型随着各类异构混合结构的发展而做出的改变, 异构混合并行编程模型可以是对现有的一种语言进行改造和重新实现, 或者是现有异构编程语言的扩展, 或者是使用指导性语句异构编程, 或者是容器模式协同编程。分析表明, 异构混合并行计算架构会进一步加强对AI的支持, 同时也会增强软件的通用性。文中还回顾了异构混合并行计算中的关键技术, 包括异构处理器之间的并行任务划分、任务映射、数据通信、数据访问, 以及异构协同的并行同步和异构资源的流水线并行等。根据这些关键技术, 文中指出了异构混合并行计算面临的挑战, 如编程困难、移植困难、数据通信开销大、数据访问复杂、并行控制复杂以及资源负载不均衡等。最后分析了异构混合并行计算面临的挑战, 指出目前关键的核心技术需要从通用与AI专用异构计算的融合、异构架构的无缝移植、统一编程模型、存算一体化、智能化任务划分和分配等方面进行突破。

http://www.jsjkx.com/CN/10.11896/jsjkx.200600045

成为VIP会员查看完整内容
0
18

最新论文

We present a novel characterization of the mapping of multiple parallelism forms (e.g. data and model parallelism) onto hierarchical accelerator systems that is hierarchy-aware and greatly reduces the space of software-to-hardware mapping. We experimentally verify the substantial effect of these mappings on all-reduce performance (up to 448x). We offer a novel syntax-guided program synthesis framework that is able to decompose reductions over one or more parallelism axes to sequences of collectives in a hierarchy- and mapping-aware way. For 69% of parallelism placements and user requested reductions, our framework synthesizes programs that outperform the default all-reduce implementation when evaluated on different GPU hierarchies (max 2.04x, average 1.27x). We complement our synthesis tool with a simulator exceeding 90% top-10 accuracy, which therefore reduces the need for massive evaluations of synthesis results to determine a small set of optimal programs and mappings.

0
0
下载
预览
Top