Theta-Resonance:用于设计空间探索的单级强化学习方法 (Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration)

Given an environment (e.g., a simulator) for evaluating samples in a specified design space and a set of weighted evaluation metrics -- one can use Theta-Resonance, a single-step Markov Decision Process (MDP), to train an intelligent agent producing progressively more optimal samples. In Theta-Resonance, a neural network consumes a constant input tensor and produces a policy as a set of conditional probability density functions (PDFs) for sampling each design dimension. We specialize existing policy gradient algorithms in deep reinforcement learning (D-RL) in order to use evaluation feedback (in terms of cost, penalty or reward) to update our policy network with robust algorithmic stability and minimal design evaluations. We study multiple neural architectures (for our policy network) within the context of a simple SoC design space and propose a method of constructing synthetic space exploration problems to compare and improve design space exploration (DSE) algorithms. Although we only present categorical design spaces, we also outline how to use Theta-Resonance in order to explore continuous and mixed continuous-discrete design spaces.

翻译：鉴于在特定设计空间评估样本的环境(例如模拟器)和一套加权评价指标 -- -- 可以使用Seta-Resonance,即单步的Markov决策程序(MDP),对生产更优化样品的智能剂进行培训。在Theta-Resonance,神经网络消耗一个恒定输入点,并产生一套政策,作为每个设计层面抽样的有条件概率密度功能(PDFs)。我们专门将现有政策梯度算法用于深加学习(D-RL),以便利用评价反馈(成本、处罚或奖励)更新我们的政策网络,以稳健的算法稳定性和最低限度的设计评价。我们在一个简单的SoC设计空间的背景下研究多种神经结构(我们的政策网络),并提出构建合成空间探索问题的方法,以比较和改进设计空间探索的算法。虽然我们只是提出明确的设计空间,但我们也概述了如何利用Theta-Reson,以探索连续和混合连续干扰设计空间的设计空间。