In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be optimal today, and it is certainly not the easiest pipeline for education. This paper examines more straightforward pipeline organizations for RISC processors that are suitable for educational purposes and for implementing embedded processors in FPGAs and ASICs. We analyze resource costs and maximum clock frequency of various designs implemented in an FPGA, using clock frequency as a performance proxy. Additionally, we validate these results with ASIC designs synthesized using the open-source SkyWater130 process. Contradictory to common wisdom, a longer pipeline (up to 5 stages) does not necessarily always increase the maximum clock frequency. In two FPGA and one ASIC implementation, we discovered that a four- or five-stage pipeline leads to a slower clock frequency than a three-stage implementation. The reason is that the width of the forwarding multiplexer in the execution stage increases with longer pipelines, which is on the critical path. We also argue that a 3-stage pipeline organization is more adequate for teaching a pipeline organization of a microprocessor.
翻译:暂无翻译