Accelerating QuantLib with an FPGA Coprocessor

In this case study we will examine how to take QuantLib, an open-source library and increase its performance by replacing calls to certain subroutines with calls to IP on an FPGA.  We will examine what is required to implement the connection between QuantLib and the FPGA, observe the actual overhead when running many calls to the FPGA, and replicate situations that are indicative of actual uses of the QuantLib library that one would find in use in the real-world

 

Summary

QuantLib is an open-source library for quantitative finance.  QuantLib offers tools that are used by practitioners in the field of Financial Engineering for advanced modeling including Monte Carlo Simulations.  One particular Monte Carlo Simulation inside the QuantLib library corresponds to the algorithm that was implemented for another case study of ours entitled “Hardware Acceleration of Monte Carlo Simulation for Option Pricing.” We can use the National Instruments C-to-FPGA interface API along with the IP generated from our previous case study to accelerate the performance of QuantLib by incorporating an FPGA into the solution.  Using high-precision benchmarking code and a sample portfolio of stocks we can see what performance gains are by using this method versus the original QuantLib solution.

To read our Case Study “Hardware Acceleration of Monte Carlo Simulation for Option Pricing” see http://www.wallstreetfpga.com/option-pricing-on-an-fpga/.

For more information about QuantLib, see http://quantlib.org.

 

Solution Overview

We scanned the source code of QuantLib and of the included examples searching for an implementation of a Monte Carlo Method being used for the pricing of European Options.  After finding such usage inside the“EquityOption” example, we created a new program that made extensive use of the existing QuantLib function for the pricing of a portfolio of between 1 and 1,500 European Options.  We measured the performance for 200 trials and varied the number of stocks inside the portfolio.  See the “Hardware Used”section below for more information.

Afterwards, we created another similar program that was exactly the same as our previous program except that it called a customized build of QuantLib, which was built by replacing calls inside of QuantLib with calls to a DLL that uses the National Instruments C-to-FPGA API, which communicates with the FPGA and performs the same calculation.  We chose this implementation method to simplify transitions of existing programs to this new FPGA enhanced edition of QuantLib.  We also stored the results of these calculations so that we could compare them with the results from the original QuantLib library.

We used Microsoft Visual Studio 2008 Professional to compile the “Release (static runtime)” configuration of QuantLib version 0.9.9.  For the C-to-FPGA interface DLL, we used LabWindows/CVI version 9.0, and for the programming of the FPGA we used LabVIEW 2009 with the FPGA and Real-Time Modules installed.

 

Establishing a Performance Baseline

We used the Windows API QueryPerformanceCounter for high-precision benchmarking of the running time of the existing QuantLib library with various portfolio sizes.  The performance of the existing QuantLib library is as follows:

 

# of Options Calculated Average Execution Time (ms) Standard Deviation (ms) Execution Time per Option (ms)
1 104.58 0 104.58
100 10,359 = (10.3 seconds) 184.16 102.17
750 75,905 = (75.9 seconds) 1118.4 101.6
1,000 100,280 = (100 seconds) 531.91 100.27
1,500 150,560 = (150 seconds) 1962.2 100.37

 

These benchmarks were taken on an Alienware Area-51 7500-R5 High Performance Gaming machine with a 3.0 Ghz Intel Core 2 Duo processor, 4GB of high performance RAM, dual 10,000RPM 150 GB hard drives running in a RAID-0 configuration with Windows 7 64-bit.

FPGA Hardware Used

We used a National Instruments PXI chassis model PXI-1042Q, with a PXI-8110 Embedded controller, with an R Series Multifunction RIO card, model PXI-7854R, which houses a Xilinx Virtex-5 LX110 FPGA.

You can read more about the products used on National Instruments (http://www.ni.com) website, just search for the model numbers listed above.

 

Connecting QuantLib to the FPGA

The National Instruments C-to-FPGA interface is capable of being linked directly in to the QuantLib DLL, but for code clarity and separation reasons, we decided to keep it in its own DLL and to use the tools provided by National Instruments, namely LabWindows/CVI to build it.

 

Performance of FPGA Solution

It was originally assumed that the overhead involved with placing calls inside of a separate DLL and sending requests over the PCI bus would be too costly, but we were proven wrong after seeing the results of the entire operation.  The FPGA was able to pass hundreds of Monte Carlo Simulation parameters inside a data window of only 35 microseconds!  Granted this is not a latency sensitive application, but even if it was, the 35 microseconds latency was more than enough to beat the performance of the existing QuantLib library.

 

# of Options Calculated Average Execution Time (ms) Standard Deviation (ms) Execution Time per Option (ms)
1 0.287 0.001 0.287
142 12.086 0.001 0.085
714 60.031 0.004 0.084
1,071 89.903 0.006 0.084
1,428 119.776 0.007 0.084

Comparison

By comparing the two tables from above, we see that the overhead involved with offloading the pricing calculation to an FPGA are almost negligible.  Especially when we see that there is an overall improvement in the performance of QuantLib of nearly 1,000!  Even though the implementation insde the FPGA is not exactly the same as that of QuantLib, the results of the calculations were within 1% of each other!

 

Conclusion

We see that by using just 1 FPGA chip, we can achieve the same performance as hundreds of nodes in a server farm.  This is also given the fact that we are not using the latest technological offerings by National Instruments, we see vast room for improvement to these results.
In addition to the benefits mentioned above, we can upgrade our solution to use a larger 18-slot PXI chassis, which can house up to 15 FPGA cards to build a super-computer that can handle the requirements of Monte Carlo Simulations that require millions or even billions of simulations per second, all inside one 3U box that takes up half of one rack!