项目名称: 基于NIC的Exascale级计算机聚合通信卸载关键技术研究
项目编号: No.61202124
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 王绍刚
作者单位: 中国人民解放军国防科学技术大学
项目金额: 24万元
中文摘要: 基于网络接口控制器(NIC)的聚合通信卸载技术是解决并行应用程序通信瓶颈的重要途径,在Exascale级计算机系统背景下,下一代基于NIC的聚合通信卸载技术面临着众核处理器、系统规模爆炸性增长、互联网络复杂等方面的挑战,迫切需要开展NIC新体系结构的研究。本课题拟提出新的聚合通信卸载软硬件构架,通过软件生成算法框架,硬件提供可编程原语支持的技术途径,降低硬件实现的复杂度,并解决有效支持众核处理器、上十万个节点的可扩展性需求等问题。课题还拟在新的架构下,研究支持互联网拓扑、非阻塞、近邻模式等聚合通信新特性的关键技术。本课题进行的研究着眼于突破软硬件接口、算法框架提取、硬件原语设计、NIC体系结构等一系列关键问题,将为下一代高性能计算机NIC的设计实现提供有效的理论和技术支持。
中文关键词: 聚合通信;并行计算机;E级计算;卸载;RDMA
英文摘要: NIC (Network Interface Controller) based Collective communicationi offload technology is an important way to alleviate the communication bottleneck for current parallelm applications. For the next generation exascale parallel computer system, NIC based collective communicatioin offload technology is facing new challenges, such as the new many-core processor architecture, explosive grow of system size and large scale system network, etc. It is required to take the research on new NIC architecture to efficiently offload collective communication on next generation exascale parallel systems. This project proposes a new software-hardware architecture to solve these challenges. The new architecture relies on the software to generate the algorithm frame, which is runned on the programmable simple hardware unit. The new NIC based collective communication offload architecture greatly reduce the hardware design complexity and resources overhead, meanwhile, it can efficiently support many-core processor architecture, and it scalability can easily support system size over 100000 nodes. This project is plan to take the research on complex network topology based collective communication offload engine, non-block communication and sparse collective communication etc under the new architecture.Our research target is to make br
英文关键词: collective communication;parallel computer;Exascale computing;offloading;RDMA