以网络为基础的电源分配网络强化学习方法(PDN) 优化高带宽内存(HBM) (Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM))

from arxiv, 15 pages, 14 figures, Under review as a journal paper at IEEE Transactions on Microwave and Theory and Techniques (TMTT) Fig. 10 revised; Fig. 14 added

In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated data sets. Therefore, without additional training, the trained network can solve new decap optimization problems. The computing time for training and data cost are critically decreased due to the scalability of the network. Thanks to its shared weight property, the network can adapt to a larger scale of problems without additional training. For verification, we compare the results with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.

翻译：在本篇文章中,我们首次提议为高带宽内存(HBM)的电力分配网络优化提供基于变压器网络强化学习(RL)方法。拟议方法可以提供最佳脱钩电容器(decap)设计,以最大限度地减少多港口的PDN自我障碍和转移障碍。实施了基于注意的变压器网络,以直接将脱帽优化政策参数化。最佳性能得到显著改善,因为关注机制有强大的表达方式,可以探索大规模裁员任务组合空间。此外,它可以捕捉脱衣任务之间的相继关系。由于在测试港口和卸载任务候选人的位置上的可重新使用电容器(decap)设计时间大大缩短。这是因为变压器网络有一个嵌入过程,以捕捉元性功能,包括测试港口位置。此外,网络是随机生成的数据集培训。因此,没有额外的培训,培训时间可以解决新的脱衣优化问题。由于网络的可追溯性搜索性,因此,优化的计算时间会急剧缩短。