Nominated by: Andre DeHon
This was the first published paper to evaluate adding FPGA-logic as an on-chip accelerator in a processor. Specifically, they considered adding a modest amount of purely combinational FPGA logic as a programmable functional unit (PFU) in a superscalar processor. The solution provided an ISA instruction to call out the custom operation over a space of programmable functions and included ``demand-paging'' of the PFU functions. The work was complete, including a compiler that automatically extracted PFU functions from the C code. The paper includes quantitative analysis showing modest (mostly 10-15%) speedups on the SPECInt92 suite with one application achieving almost a 2x speedup.
Shortly after this paper we had many high-profile integrated FPGA accelerators (GARP, Chimera, OneChip) with more sophisticated PFUs. Later we saw a large field on research on extracting custom instructions for these PFU-like accelerators or for creating custom instructions for ASIPs (e.g. Tensillica). Today, on-chip accelerators are common, as are SoCs combining FPGAs and processors and vendor-supported tools to compile FPGA-accelerators from C. This was one of the pioneering works that suggested a concrete, scalable PFU model and demonstrated its early promise, even in a minimal form.