论文标题
快速和节能的衍生品风险分析:Xilinx和Intel FPGA的流式选项希腊人
Fast and energy-efficient derivatives risk analysis: Streaming option Greeks on Xilinx and Intel FPGAs
论文作者
论文摘要
尽管FPGA在一段时间以来,FPGA在加速高频金融工作载荷方面取得了成功,但它们用于定量财务,这是使用数学模型来分析金融市场和证券的使用,但已更加有限。目前,CPU是此类工作负载的最常见架构,一个重要的问题是FPGA是否可以改善这些体系结构上遇到的一些瓶颈。在本文中,我们将以前的工作扩展到行业标准证券技术分析中心(Stac \ textregistered)衍生品风险分析基准Stac-a2 \ TextTradeMark {},首先将其从以前的Xilinx实现移植到我们的Intel Stratix-10 FPGA,从而探索从一种FPGA进行挑战,从而探索了另一种FPGA的挑战。然后,我们提出了一种主机数据流的方法,该方法最终在Xilinx Alveo U280 FPGA上胜过以前的版本高达4.6倍,并且在最大问题大小上所需的能量少9倍,而表现分别胜过CPU和GPU版本,分别高达8.2倍和5.2倍。这项工作的结果是在Xilinx和Intel FPGA上运行的该行业标准基准的FPGA性能的显着提高,此外,可以将优化和移植技术探索可用于其他HPC工作负载。
Whilst FPGAs have enjoyed success in accelerating high-frequency financial workloads for some time, their use for quantitative finance, which is the use of mathematical models to analyse financial markets and securities, has been far more limited to-date. Currently, CPUs are the most common architecture for such workloads, and an important question is whether FPGAs can ameliorate some of the bottlenecks encountered on those architectures. In this paper we extend our previous work accelerating the industry standard Securities Technology Analysis Center's (STAC\textregistered) derivatives risk analysis benchmark STAC-A2\texttrademark{}, by first porting this from our previous Xilinx implementation to an Intel Stratix-10 FPGA, exploring the challenges encountered when moving from one FPGA architecture to another and suitability of techniques. We then present a host-data-streaming approach that ultimately outperforms our previous version on a Xilinx Alveo U280 FPGA by up to 4.6 times and requiring 9 times less energy at the largest problem size, while outperforming the CPU and GPU versions by up to 8.2 and 5.2 times respectively. The result of this work is a significant enhancement in FPGA performance against the previous version for this industry standard benchmark running on both Xilinx and Intel FPGAs, and furthermore an exploration of optimisation and porting techniques that can be applied to other HPC workloads.