This paper presents the foundational elements of a distributed memory method for mesh generation that is designed to leverage concurrency offered by large-scale computing. To achieve this goal, meshing functionality is separated from performance aspects by utilizing a separate entity for each - a shared memory mesh generation code called CDT3D and PREMA for parallel runtime support. Although CDT3D is designed for scalability, lessons are presented regarding additional measures that were taken to enable the code's integration into the distributed memory method as a black box. In the presented method, an initial mesh is data decomposed and subdomains are distributed amongst the nodes of a high-performance computing (HPC) cluster. Meshing operations within CDT3D utilize a speculative execution model, enabling the strict adaptation of subdomains' interior elements. Interface elements undergo several iterations of shifting so that they are adapted when their data dependencies are resolved. PREMA aids in this endeavor by providing asynchronous message passing between encapsulations of data, work load balancing, and migration capabilities all within a globally addressable namespace. PREMA also assists in establishing data dependencies between subdomains, thus enabling "neighborhoods" of subdomains to work independently of each other in performing interface shifts and adaptation. Preliminary results show that the presented method is able to produce meshes of comparable quality to those generated by the original shared memory CDT3D code. Given the costly overhead of collective communication seen by existing state-of-the-art software, relative communication performance of the presented distributed memory method also shows that its emphasis on avoiding global synchronization presents a potentially viable solution in achieving scalability when targeting large configurations of cores.
翻译:暂无翻译