Reinforcement learning is well-studied under discrete actions. Integer actions setting is popular in the industry yet still challenging due to its high dimensionality. To this end, we study reinforcement learning under integer actions by incorporating the Soft Actor-Critic (SAC) algorithm with an integer reparameterization. Our key observation for integer actions is that their discrete structure can be simplified using their comparability property. Hence, the proposed integer reparameterization does not need one-hot encoding and is of low dimensionality. Experiments show that the proposed SAC under integer actions is as good as the continuous action version on robot control tasks and outperforms Proximal Policy Optimization on power distribution systems control tasks.
翻译:强化学习是在离散动作下进行的研究。 整数动作设置在行业中很受欢迎, 但由于其高维度, 仍然具有挑战性。 为此, 我们研究在整数动作下进行强化学习, 将软Acor- Critic( SAC) 算法与整数再校准法结合。 我们对整数动作的关键观察是, 它们的离散结构可以使用其可比性属性进行简化。 因此, 拟议的整数重新计量不需要一热编码, 并且是低维度的。 实验显示, 拟议的SAC 整数下动作与机器人控制任务的持续操作版本和超效的配电系统控制任务Proximal政策优化相同。