项目名称: 多核平台上的BESIII离线物理软件与调度策略研究
项目编号: No.11205179
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 物理学II
项目作者: 程耀东
作者单位: 中国科学院高能物理研究所
项目金额: 30万元
中文摘要: BEPCII和BESIII的重大升级改造工程已经完成,数据分析工作正在全面展开。BESIII离线物理软件是数据分析的基础和关键,其性能决定着数据分析的进度和结果。随着计算技术的发展,多核计算机平台已经成为必然的趋势。然而,目前的BESIII离线物理软件的开发和运行基于传统的单核处理器环境,系统中基本的调度和运行单元是"作业",粒度过粗,导致计算任务不能很好的利用多核计算资源,造成CPU、内存、硬盘与网络带宽等资源的浪费。针对这些问题,本项目基对计算任务的细粒度进程级运行与调度进行研究,利用内存记账、现代操作系统的CoW、KSM等技术,提出"节点调度策略"与"先导作业执行",改变传统的"作业调度"方法,以提高作业调度与运行效率。通过对物理软件和任务调度系统的研究,本项目将大幅提高物理数据的分析效率,为BESIII实验早出快出高质量的物理成果奠定坚实的基础。
中文关键词: 多核;离线物理软件;北京谱仪;数据分析;节点调度
英文摘要: The update of BEPCII and BESIII has been completed and massive data is being analyzed. BESIII offline software is the foundation of the data analysis, and its performance determines the progress and results. With the development of computing technology, multi-core computing platform has become an inevitable trend. However, the desgin and development of the the current BESIII offline software environment is based on the traditional single-core processor. In the system, the unit of task scheduling and execution is "job". The granularity is too big, which leads to the waste of multicore computing resources, such as CPU, memory, hard disk and network bandwidth. To solve these problems, this project will have a deep research on the fine-grained parallel execution of computing tasks and job scheduling policy. By exploiting the advanced technoligies of modern operating system, such as memory accounting, Copy-On-Write, Kernel Shared Memeory and so on, the project proposes the solution of "node scheduling policy" and "pilot job execution" compared with the traditional job scheduling methods to improve the efficiency of job scheduling and execution. The research on physical software and task scheduling system will substantially increase the efficiency of massive data analysis, which helps BESIII experiment to achieve high
英文关键词: multi-core;offline software system;BESIII;data analysis;node scheduling