Hardware Acceleration of Monte Carlo Simulation for Option Pricing

Using Field Programmable Gate Arrays (FPGAs) to accelerate financial derivative calculations is becoming very common.  However, the practicality of FPGAs still present challenges due to extended development time [1].  In this case study we will benchmark the performance and development times of European Option Pricing using Monte Carlo simulations on an FPGA and a CPU.  We will demonstrate that the high level, graphical language of LabVIEW FPGA is capable of accelerating financial calculations by 131 times without the extended development time that is considered the norm for FPGA development.

The non-FPGA (CPU based) solution was evaluated using an Alienware Area-51 7500 Dual Core 3.0 GHz.  The FPGA solution was evaluated by using a National Instruments PXI-7854R R Series with Virtex-5 LX110 FPGA.

 

Graphical FPGA Design

Taking advantage of an FPGA requires the use of a Hardware Description Language (HDL) such as Verilog orVHDL.  These languages are difficult to learn and result in very lengthy source code files that often accomplish very little with a lot of effort.  For example, the VHDL code for calculating the square root of a number can take anywhere from 117 lines [2] to 396 lines [3] of code.  The corresponding LabVIEW FPGA code looks like this:

Algorithm

We used the following algorithm for calculating the price of a European Option by using a Monte Carlo Simulation [4]:

 

Design Approach

After analyzing the algorithm, we see that the best candidate for optimization is the For loop.  Since there are no dependencies between each iteration, we can use an FPGA to execute multiple simultaneous iterations with true parallelism.

Code written in LabVIEW is converted into VHDL by the FPGA Module, which then makes a hand-off to the Xilinx tools.  The Xilinx tools determine the proper placement, routing and synthesis of all the generated logic, and determine the maximum attainable clock rate.  The maximum clock rate determines exactly how fast the implementation will run.

LabVIEW FPGA Implementation

Here is a screen shot of the LabVIEW code that runs on an FPGA.  The graphical code is simple and intuitive. (click on the image for a close up)

 

 

Here is a close-up of the Calculation VI.  VI stands for Virtual Instrument, and is the LabVIEW equivalent of a function.  Inside this VI is where most of the iteration calculations occur, notice the use of Fixed-Point Math.

 

After waiting 16 minutes and 55 seconds, the compilation completes and the compiler has reported to us that the maximum attainable clock rate is 80MHz, which at 1 simulation per clock tick gives us 80 million simulations per second, or 80 thousand simulations per millisecond. After reviewing the device utilization map, we see that this specific FPGA target can possibly fit up to ten copies of this algorithm.  We copied the algorithm ten times on the chip and set it to compile. This time compilation took much longer, but it allowed us to fully utilize the Virtex-5 FPGA chip.

The final performance numbers were:

100,000 simulations per 0.125 millisecond

1.25 microseconds per simulation

C# (.NET Framework 3.5) Implementation

Here is the source code of our C# implementation:

 

The above code was compiled and run on an Alienware Aea-51 7500, running Windows Vista 32-bit Service Pack 2, .NET Framework 3.5, with 4 Gigabytes of RAM and dual 10,000 RPM 150GB hard disks configured in a RAID-0 configuration.  The Windows Experience Index reported by the Windows Performance Information and Tools utility was 5.7 out of 6.0.

The benchmark ran the simulation one thousand times and the average time required for each trial was 16.394 ms, giving us the following performance:

100,000 simulations per 16.394 milliseconds

6,099 simulations per millisecond

Performance Comparison

When comparing the results, we see that the FPGA implementation is 131X times faster demonstrating that high level, graphical programming with LabVIEW FPGA did not sacrifice performance and decreased development time to several days.

Time required for 100,000 simulations
Simulations per ms
Alienware Area-51 7500
16.394 milliseconds 6,099
Xilinx Virtex-5 FPGA (NI-7854R) 0.125 milliseconds
800,000

Conclusion

With this case study we see that complex financial algorithms can be efficiently programmed onto FPGAs without in-depth knowledge of digital design or complex Electronic Design Automation (EDA) tools.  LabVIEW graphical programming is an intuitive way to program embedded devices because the block diagram of a LabVIEW FPGA VI can represent the parallelism and timing of embedded systems much better than text-based languages.  Using a high level, graphical development environment of LabVIEW FPGA reduced our development time without compromising the performance gains of using an FPGA.

 

References

[1] Donna Kardos Yesalavich; Trading Firms Turn To Videogame Chips To Get Even Faster; Wall Street Journalhttp://online.wsj.com/article/BT-CO-20100427-716893.html; APRIL 27, 2010.

[2] VHDL Reference Material: Simple parallel 8-bit sqrt using one component; University of Maryland, Baltimore County, Department of Computer Science & Electrical Engineering; http://www.cs.umbc.edu/portal/help/VHDL/samples/samples.shtml#sqrt8; Last updated: August 20, 2007.

[3] VHDL Reference Material: 32-bit parallel integer square root; University of Maryland, Baltimore County, Department of Computer Science & Electrical Engineering; http://www.cs.umbc.edu/portal/help/VHDL/samples/samples.shtml#sqrt32; Last updated: August 20, 2007.

[4] Hull, John C.; Options, Futures and other Derivatives; Prentice Hall; June 2005.
This article has been featured on the Tabb Forum and on National Instruments’ website.