This document is one of the deliverable reports created for the ESCAPE project. ESCAPE stands for Energy-efficient Scalable Algorithms for Weather Prediction at Exascale. The project develops world-class, extreme-scale computing capabilities for European operational numerical weather prediction and future climate models. This is done by identifying Weather & Climate dwarfs which are key patterns in terms of computation and communication (in the spirit of the Berkeley dwarfs). These dwarfs are then optimised for different hardware architectures (single and multi-node) and alternative algorithms are explored. Performance portability is addressed through the use of domain specific languages. In today's computer architectures, moving data is considerably more time- and energy consuming than computing on this data. One of the key performance optimizations for any application is therefore to minimize data motion and maximize data reuse. Especially on modern supercomputers with very complex and deep memory hierarchies, it is mandatory to take data locality into account. Especially when targeting accelerators with directive systems like OpenACC or OpenMP, identifying data scope, access type and data reuse are critical to minimize the data transfers from and to the accelerator. Unfortunately, manually identifying data locality information in complex code bases can be a time consuming task and tool support is therefore desirable. In this report we summarize the results of a survey of currently available tools that support software developers and performance engineers with data locality information in complex code bases like numerical weather prediction (NWP) or climate simulation applications. Based on the survey results we then recommend a tool and specify some extensions for a tool to solve the problems encountered in an NWP application.
翻译:此文档是为 ESCEE 项目创建的可交付报告之一。 ESCAPE 指的是 Exascal 的节能可缩放算法 。 该项目开发了用于欧洲操作性数字天气预测和未来气候模型的世界级、 极端规模的计算能力 。 为此, 确定天气和气候侏儒是计算和通信方面的关键模式( 以伯克利侏儒的精神 ) 。 这些侏儒随后被优化用于不同的硬件结构( 单级和多节级) 和替代算法 。 通过使用特定域语言处理可执行的可移动性 。 在今天的计算机结构中, 移动数据比计算这些数据花费的时间和能量要多得多。 因此, 任何应用的关键性能优化之一是将数据动作最小化和数据再利用最大化( 以伯克利尔基 ) 。 特别是在具有非常复杂和深记忆等级的现代超级计算机计算机上, 必须将数据支持点考虑在内 。 特别是当以 OpenACC 或 OpenMP 等指令系统为对象时,, 确定数据范围、 访问类型和数据再利用数据再利用 用于当前 数据库数据库中的数据应用的精确数据库中, 将数据转换数据转换到一个数据转换为最关键。