Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. As the systems grow in complexity, fine-tuning architectural parameters across multiple sub-systems (e.g., datapath, memory blocks in different hierarchies, interconnects, compiler optimization, etc.) quickly results in a combinatorial explosion of design space. This makes domain-specific customization an extremely challenging task. Prior work explores using reinforcement learning (RL) and other optimization methods to automatically explore the large design space. However, these methods have traditionally relied on single-agent RL/ML formulations. It is unclear how scalable single-agent formulations are as we increase the complexity of the design space (e.g., full stack System-on-Chip design). Therefore, we propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. The key idea behind using MARL is an observation that parameters across different sub-systems are more or less independent, thus allowing a decentralized role assigned to each agent. We test this hypothesis by designing domain-specific DRAM memory controller for several workload traces. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines such as Proximal Policy Optimization and Soft Actor-Critic over different target objectives such as low power and latency. To this end, this work opens the pathway for new and promising research in MARL solutions for hardware architecture search.
翻译:微处理器设计师在追求高性能和能源效率的过程中越来越多地采用针对具体域的定制方法。随着系统在复杂度上不断增长,微调多个子系统(例如数据路径、不同等级系统中的记忆块、互连、编译优化等)的建筑参数时,微处理器设计师正在越来越多地采用针对特定域的定制方法来寻求高性能和高能效。随着系统在复杂度上不断增长,微调的建筑参数在多个子系统(例如数据路径、不同等级系统中的记忆块、互连、编译器优化等)的复杂度迅速导致设计空间的组合爆炸。这使得针对特定域的定制定制化是一项极具挑战性的任务。 先前的工作是利用强化学习(RL)和其他优化方法来自动探索大的设计空间。然而,这些方法历来依赖单一试管的RL/ML配方。我们通过设计具体域的搜索路径的单剂配制公式来测试这个假设。