众核集群上基于MPI的模型扩展及性能优化研究

项目名称： 众核集群上基于MPI的模型扩展及性能优化研究

项目编号： No.61502450

项目类型： 青年科学基金项目

立项/批准年度： 2016

项目学科： 自动化技术、计算机技术

项目作者： 李士刚

作者单位： 中国科学院计算技术研究所

项目金额： 20万元

中文摘要： 众核集群已逐渐成为超级计算机的主流架构，其节点内大规模并行及复杂的硬件架构等特点使得传统编程模型及优化技术难以应对并行应用的负载不均衡、非线性可扩展等问题。为解决上述众核集群系统的并行难题，本项目基于高性能计算重要编程模型MPI，进行模型扩展及性能优化研究，主要内容包括：1）面向非规则应用的编程模型扩展研究。针对传统非规则应用的负载不均衡以及新兴的大规模深度学习算法的非规则通信模式等问题，扩展MPI模型以高效支持众核集群上任务并行及活动消息通信；2）通信性能模型研究。对多层次硬件信息进行抽象，特别是对缓存一致性架构上通信开销建模，形成一套新颖的面向众核集群的通信性能模型，进而对并行软件性能优化进行指导；3）通信性能优化研究。通过共享地址空间、拓扑感知以及多通信层次协同优化等，降低及隐藏多核及众核节点内的MPI通信开销，并利用基于性能模型的最优算法自适应选择方法，对通信接口实现进行自动调优。

中文关键词： MPI；众核集群；非规则应用；深度学习；性能模型

英文摘要： Many-core clusters have gradually become the mainstream of supercomputer architectures, which feature massive intra-node parallelism and hardware complexity. Traditional programming models and optimization techniques exhibit more limitations to deal with the issues of workload imbalance, nonlinear scalability of the parallel applications. To address the above problems, this program is based on one of the most important programming models in the area of high performance computing - MPI, and does research on MPI model extension and performance optimization. The main contents include: 1) Programming model extension for irregular applications. To efficiently describe the traditional irregular applications with workload imbalance and emerging large-scale deep learning algorithms with irregular communication patterns, we extend the MPI programming model to support irregular task parallelism and active message; 2) Communication performance model. We establish a novel performance model to abstract the multi-level hardware details, especially for the cache coherence architecture, predict the communication cost, and further guide the performance tuning of parallel softwares; 3) Communication performance optimization. Utilizing the techniques of shared address space, topology-aware communication and multi-level collaborative optimization, the intra-node communication overhead is reduced and overlapped. Communication interfaces are automatically tuned based on performance model.

英文关键词： MPI;Many-core clusters;Irregular applications;Deep learning;Performance model

成为VIP会员查看完整内容