Recent advances in soft GPGPU architectures have shown that a small (<10K LUT), high performance (770 MHz) processor is possible in modern FPGAs. In this paper we architect and evaluate soft SIMT processor banked memories, which can support high bandwidth (up to 16 ports) while maintaining high speed (over 770 MHz). We compare 9 different memory architectures, including simpler multi-port memories, and run a total of 51 benchmarks (different combinations of algorithms, data sizes and processor memories) to develop a comprehensive set of data which will guide the reader in making an informed memory architecture decision for their application. Our benchmarks are comprised of matrix transpositions (memory intensive) and FFTs (split between memory accesses, floating point, and integer computations) to provide a balanced evaluation. We show that the simpler (but more memory block intensive) multi-port memories offer higher performance than the more architecturally complex banked memories for many applications, especially for smaller memories, but the effective footprint cost of the multi-port memories quickly becomes prohibitive as dataset sizes increase. Our banked memory implementation results - high bandwidth, high Fmax, and high density - can be used for other FPGA applications as well, such as HLS (High Level Synthesis).
翻译:暂无翻译