CFD users of supercomputers usually resort to rule-of-thumb methods to select the number of subdomains (partitions) when relying on MPI-based parallelization. One common approach is to set a minimum number of elements or cells per subdomain, under which the parallel efficiency of the code is "known" to fall below a subjective level, say 80%. The situation is even worse when the user is not aware of the "good" practices for the given code and a huge amount of resources can thus be wasted. This work presents an elastic computing methodology to adapt at runtime the resources allocated to a simulation automatically. The criterion to control the required resources is based on a runtime measure of the communication efficiency of the execution. According to some analytical estimates, the resources are then expanded or reduced to fulfil this criterion and eventually execute an efficient simulation.
翻译:超级计算机的 CFD 用户在依赖基于 MPI 的平行化时,通常使用规则式的方法选择子域数(分区) 。 一种共同的方法是设定每个子域的最小元素数或单元格数, 根据该方法, 代码的平行效率“已知” 低于主观水平, 例如 80% 。 如果用户不知道给定代码的“ 良好” 做法, 从而可能浪费大量资源, 情况就更糟糕了。 这项工作是一种弹性计算方法, 以便在运行时自动调整分配给模拟的资源。 控制所需资源的标准是以执行通信效率的运行时间尺度为基础。 根据一些分析估计, 然后扩大或减少资源, 以达到这一标准, 并最终执行高效的模拟 。