We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core utilization is difficult to achieve because the adaptive work-load can vary greatly across the integration space and is impossible to predict a priori. Existing parallel algorithms utilize sequential computations on independent processors, which results in bottlenecks due to the need for data redistribution and processor synchronization. Our algorithm employs a high-throughput approach in which all existing sub-regions are processed and sub-divided in parallel. Repeated sub-region classification and filtering improves upon a brute-force approach and allows the algorithm to make efficient use of computation and memory resources. A CUDA implementation shows orders of magnitude speedup over the fastest open-source CPU method and extends the achievable accuracy for difficult integrands. Our algorithm typically outperforms other existing deterministic parallel methods.
翻译:我们为大规模平行建筑的多维数字整合这一具有挑战性的问题提出了一种新的适应性平行算法。适应性算法展示了最佳的性能,但高效的多核心利用却难以实现,因为适应性工作负荷在整个整合空间中差异很大,无法预先预测。现有的平行算法在独立处理器上采用连续计算,这导致由于数据再分配和处理器同步的需要而出现瓶颈。我们的算法采用高通量法,所有现有的子区域都同时处理和分解。重复性的次区域分类和过滤法改进了粗力方法,使算法能够有效地使用计算和记忆资源。CUDA的实施显示快速的开放源代码处理法的加速度,并扩大了困难的元件的可实现的准确性。我们的算法通常优于其他现有的确定性平行方法。