The ensemble data assimilation of computational fluid dynamics simulations based on the lattice Boltzmann method (LBM) and the local ensemble transform Kalman filter (LETKF) is implemented and optimized on a GPU supercomputer based on NVIDIA A100 GPUs. To connect the LBM and LETKF parts, data transpose communication is optimized by overlapping computation, file I/O, and communication based on data dependency in each LETKF kernel. In two dimensional forced isotropic turbulence simulations with the ensemble size of $M=64$ and the number of grid points of $N_x=128^2$, the optimized implementation achieved $\times3.80$ speedup from the naive implementation, in which the LETKF part is not parallelized. The main computing kernel of the local problem is the eigenvalue decomposition (EVD) of $M\times M$ real symmetric dense matrices, which is computed by a newly developed batched EVD in $\verb|EigenG|$. The batched EVD in $\verb|EigenG|$ outperforms that in $\verb|cuSOLVER|$, and $\times65.3$ speedup was achieved.
翻译:暂无翻译