We improve the performance of multigrid solvers on many-core architectures with cache hierarchies by reorganizing operations in the smoothing step to minimize memory transfers. We focus on patch smoothers, which offer robust convergence rates with respect to the finite element degree for various equations, in the setting of multiplicative subspace correction for numerical efficiency. By combining the computation of local residuals with local solvers, we increase the locality of the problem and thus reduce data transfers. The thread-parallel implementation of this algorithm is based on coloring, which contradicts cache efficiency. We improve data locality by rearranging the loop into batches so that more data can be reused. The organization of consecutive batches prioritizes data locality.
翻译:暂无翻译