GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accel-sim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.
翻译:暂无翻译