With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.
翻译:随着摩尔定律的结束,现代处理器(如RISC-V定制扩展)需要快速的架构创新来实现性能扩展。在微处理器设计中,程序抽样是一个关键步骤,它选择工作负载模拟的代表性模拟点。虽然SimPoint是数十年来的事实标准方法,但它在基本块向量(BBV)方面的有限表现力需要耗费时间的人工调整,通常需要数月时间,这妨碍了快速创新和敏捷硬件开发。本文介绍了一种名为神经程序抽样(NPS)的新框架,它使用动态快照的图神经网络学习执行嵌入。NPS使用AssemblyNet进行嵌入生成,利用应用程序的代码结构和运行时状态。AssemblyNet作为NPS的图模型和神经架构,捕获了程序行为的方面,如数据计算,代码路径和数据流。AssemblyNet被训练用于预测连续内存地址的数据预取任务。在实验中,NPS将平均误差降低38%,最高可超过SimPoint63%的性能。此外,NPS表现出了较高的鲁棒性和准确性,降低了昂贵的准确性调整开销。此外,NPS表现出了比行为学习的最先进的GNN方法更高的准确性和一般性,可以生成高质量的执行嵌入。