We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in developing a parallelized and vectorized implementation of the combinatoric Kalman filter algorithm has enabled efficient global reconstruction of the entire event on modern computer architectures. We demonstrate the performance of the new implementation on Intel Xeon and NVIDIA GPU architectures.
翻译:我们介绍了从硅条模块中平行拆解和分组原始数据以重建有电粒子轨迹的结果;通过同时处理多个事件,在CPU或GPU的CUDA流上使用嵌套的 OpenMP 平行程序进一步改进了吞吐量;新的实施,以及早些时候为平行和矢量地实施组合的Kalman过滤算法所开展的工作,使全球有效地重建了现代计算机结构的整个活动。我们展示了英特尔Xeon和NVIDIA GPU结构的新实施绩效。