FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs

Modern FPGA devices offer massive parallelism and high flexibility for realizing application- or domain-specific accelerated computing. However, they are still out of reach for most software developers due to the difficulties in programming, even with the recent advances in C-based high-level synthesis (HLS). 

This paper tackles the programmability challenge of FPGAs by introducing FCUDA, an automated compilation flow that allows users to program FPGAs with CUDA, a data-parallel programming model that is widely used among GPU programmers. More concretely, FCUDA performs source-to-source compilation to lower a CUDA program into C/C++ code suitable for processing by the AutoESL AutoPilot HLS tool (which was later acquired by Xilinx and transformed into Vivado HLS). The compiler can effectively extract the coarse-grained data-level parallelism exposed in the CUDA program to generate customized processing engines that execute in parallel. Each of these processing engines further exploits fine-grained instruction-level parallelism by leveraging optimizations by HLS such as pipelining. The FPGA-based accelerators generated by FCUDA compare favorably against a GPU baseline in both performance and energy efficiency. 

FCUDA is the first effort that demonstrates a viable source-to-source compilation flow to program FPGAs using a data-parallel language. This work garnered much attention and received the Best Paper Award at the 2009 IEEE Symposium on Application Specific Processors. The paper has already attracted more than 200 citations according to Google Scholar. The software is also released in open-source format and has generated many follow-up works. Furthermore, it is important to note that the FCUDA project was well ahead of the curve, as similar commercial tool flows such as the OpenCL-to-FPGA compilers were made available 3-4 years later. 

Endorsement by: Zhiru Zhang, Associate Professor, Cornell University