In simulation of nuclear reactor physics using the Monte Carlo neutron transport method on GPUs, the sorting of particles play a significant role in execution performance. Traditionally, CPUs and GPUs are separated devices connected with low data transfer rate and high data transfer latency. Emerging computing chips tend to integrate CPUs and GPUs. One example is the Apple silicon chips with unified memory. Such a unified memory chips has opened doors for new strategies of collaboration between CPUs and GPUs for Monte Carlo neutron transport. Sorting particle on CPU and transport on GPU is an example of such new strategy, which has been suffering the high CPU-GPU data transfer latency on the traditional devices with separated CPU and GPU. The finding is that for the Apple M2 max chip, sorting on CPU leads to better performance than sorting on GPU for the ExaSMR whole core benchmark problems, while for the HTR-10 high temperature gas reactor fuel pebble problem, sorting on GPU is more efficient. The features of partially sorted particle order have been identified to contribute to the higher performance with CPU sort than GPU for the ExaSMR problem. The in-house code using both CPUs and GPUs achieves 7.5 times power efficiency that of OpenMC on CPUs for ExaSMR whole core and 50 times for HTR-10 fuel pebble benchmark problems.
翻译:暂无翻译