Recently, reinforcement learning has been used to address logic synthesis by formulating the operator sequence optimization problem as a Markov decision process. However, through extensive experiments, we find out that the learned policy makes decisions independent from the circuit features (i.e., states) and yields an operator sequence that is permutation invariant to some extent in terms of operators. Based on these findings, we develop a new RL-based method that can automatically recognize critical operators and generate common operator sequences generalizable to unseen circuits. Our algorithm is verified on both the EPFL benchmark, a private dataset and a circuit at industrial scale. Experimental results demonstrate that it achieves a good balance among delay, area and runtime, and is practical for industrial usage.
翻译:最近,强化学习被用于逻辑合成,将操作员序列优化问题作为Markov的决策过程来阐述逻辑优化问题。然而,通过广泛的实验,我们发现,所学的政策决定独立于电路特征(即状态),并产生一个在某种程度上在操作员方面变化不一的操作员序列。根据这些研究结果,我们开发了一种新的基于RL的新方法,可以自动识别关键操作员,并产生通用操作员序列,可普遍用于隐形电路。我们的算法根据EPFL基准、私人数据集和工业规模的电路进行验证。 实验结果表明,它在延迟、面积和运行时间之间实现了良好的平衡,对工业使用是实用的。