Whilst FPGAs have enjoyed success in accelerating high-frequency financial workloads for some time, their use for quantitative finance, which is the use of mathematical models to analyse financial markets and securities, has been far more limited to-date. Currently, CPUs are the most common architecture for such workloads, and an important question is whether FPGAs can ameliorate some of the bottlenecks encountered on those architectures. In this paper we extend our previous work accelerating the industry standard Securities Technology Analysis Center's (STAC\textregistered) derivatives risk analysis benchmark STAC-A2\texttrademark{}, by first porting this from our previous Xilinx implementation to an Intel Stratix-10 FPGA, exploring the challenges encountered when moving from one FPGA architecture to another and suitability of techniques. We then present a host-data-streaming approach that ultimately outperforms our previous version on a Xilinx Alveo U280 FPGA by up to 4.6 times and requiring 9 times less energy at the largest problem size, while outperforming the CPU and GPU versions by up to 8.2 and 5.2 times respectively. The result of this work is a significant enhancement in FPGA performance against the previous version for this industry standard benchmark running on both Xilinx and Intel FPGAs, and furthermore an exploration of optimisation and porting techniques that can be applied to other HPC workloads.
翻译:虽然FPGA在加快高频金融工作量方面取得了一定时间的成功,但它们用于量化融资(即使用数学模型分析金融市场和证券)的情况远为有限,目前,CPU是这类工作量最常见的结构,一个重要问题是,FPGAs能否改善这些结构中遇到的一些瓶颈。在本文件中,我们将我们以前加快行业标准证券技术分析中心(STAC\text登记)衍生物风险分析基准STAC-A2\texttrademark* 的工作扩展为STAC-A2\texttradmark,首先将这一点从我们以前的Xilinx实施到Intel Stratix-10 FPGA, 探索从一个FPGA结构向另一个结构过渡时遇到的挑战和技术的适宜性。然后,我们提出一个主机数据流方法,最终比我们在Xilinx Alveo U280 FPGA(STAC\ Alveo U280 FPGA)上以前版本的版本高出4.9倍的能源,同时将CPU和GPUFA版本分别比8.和5.2FA(HA)其他标准化技术都大大改进。