As modern HPC systems increasingly rely on greater core counts and wider vector registers, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase in parallelism are molecular dynamics simulations. In this paper, we describe our efforts at modernizing the ESPResSo++ molecular dynamics simulation package by restructuring its particle data layout for efficient memory accesses and applying vectorization techniques to benefit the calculation of short-range non-bonded forces, which results in an overall 3 times speedup and serves as a baseline for further optimizations. We also implement finer-grain parallelism for multi-core CPUs through HPX, a C++ runtime system which uses lightweight threads and an asynchronous many-task approach to maximize parallelism. Our goal is to evaluate the performance of an HPX-based approach compared to the bulk-synchronous MPI-based implementation. This requires the introduction of an additional layer to the domain decomposition scheme that defines the task granularity. On spatially inhomogeneous systems, which impose a corresponding load-imbalance in traditional MPI-based approaches, we demonstrate that by choosing an optimal task size, the efficient work-stealing mechanisms of HPX can overcome the overhead of communication resulting in an overall 1.3 times speedup compared to the baseline MPI version.
翻译:由于现代HPC系统日益依赖更大的核心计数和更广泛的矢量登记册,应用需要调整,以充分利用这些硬件能力。可以受益于平行增长的一类应用是分子动态模拟。在本文件中,我们描述了我们通过调整其粒子数据布局以高效内存存访问和应用传介技术使其粒子数据布局实现高效内存访问并有利于计算短程非瓶装力量,从而导致总体加速3倍,并成为进一步优化的基准。我们还通过HPX(C++运行时间系统,使用轻重线和无同步性多任务模拟软件,以最大限度地实现平行化,使ESPResSo++分子动态模拟包现代化。我们的目标是评估基于HPX的粒子数据布局性功能,与基于散心的MPI执行工具相比,有利于计算短程非瓶性非瓶化力量,这需要为确定任务微粒化的域分解计划增加一层层。关于多核心CPU的系统,通过HPX(HPX),即C+R运行系统,将相应的负负负式负式负式同步时间,从而在传统的MPI顶流式工作周期内,从而展示了以最佳超时平时平平时段模式,从而显示以最佳平平流式平平平平平时段式的通信机制。