Discrete ordinates $S_N$ transport solvers on unstructured meshes pose a challenge to scale due to complex data dependencies, memory access patterns and a high-dimensional domain. In this paper, we review the performance bottlenecks within the shared memory parallelization scheme of an existing transport solver on modern many-core architectures with high core counts. With this analysis, we then survey the performance of this solver across a variety of compute hardware. We then present a new Asynchronous Many-Task (AMT) algorithm for shared memory parallelism, present results showing an increase in computational performance over the existing method, and evaluate why performance is improved.
翻译:非结构化网格上的离散纵标$S_N$输运求解器,由于复杂的数据依赖性、内存访问模式以及高维计算域,其可扩展性面临挑战。本文首先分析了现有输运求解器在具有高核心数的现代众核架构上,其共享内存并行化方案中的性能瓶颈。基于此分析,我们随后评估了该求解器在多种计算硬件上的性能表现。接着,我们提出了一种用于共享内存并行的新型异步多任务算法,展示了其在计算性能上相较于现有方法的提升,并评估了性能得以改进的原因。