Kernel matrix-vector product is ubiquitous in many science and engineering applications. However, a naive method requires $O(N^2)$ operations, which becomes prohibitive for large-scale problems. We introduce a parallel method that provably requires $O(N)$ operations to reduce the computation cost. The distinct feature of our method is that it requires only the ability to evaluate the kernel function, offering a black-box interface to users. Our parallel approach targets multi-core shared-memory machines and is implemented using OpenMP. Numerical results demonstrate up to $19\times$ speedup on 32 cores. We also present a real-world application in geostatistics, where our parallel method was used to deliver fast principle component analysis of covariance matrices.
翻译:内核矩阵-矢量器产品在许多科学和工程应用中无处不在。然而,一种天真的方法需要花费O(N)2美元的运作,这种操作对于大规模问题来说是令人望而却步的。我们引入了一种平行的方法,这种平行的方法可以证明需要花费O(N)美元来降低计算成本。我们的方法的特征是,它只需要评估内核功能的能力,为用户提供一个黑盒界面。我们平行的方法针对多核心共享和模拟机器,并使用 OpenMP 来实施。数字结果显示,32个核心的加速速度高达19美元。我们还在地理统计学中展示了一个真实世界的应用,我们同时使用的方法对共变量矩阵进行快速原则部分分析。